Question 3 of 10Pro Only
How does pipeline parallelism work in deep learning training? Explain micro-batching, pipeline schedules like GPipe and 1F1B, and how to minimize pipeline bubbles.
Sample answer preview
Pipeline parallelism partitions a model's layers into stages, with each stage assigned to a different device. Data flows through the pipeline as the forward pass computes activations stage by stage, followed by the backward pass computing gradients in reverse order.
pipeline parallelismmicro-batchingpipeline bubblesGPipe1F1Bactivation checkpointing