Question 2 of 10

Explain how gradient synchronization works in data parallel training. What are the differences between synchronous and asynchronous approaches, and what communication patterns are used?