Question 5 of 10Pro Only
Compare batch normalization and layer normalization in deep learning. Why do Transformers typically use layer normalization while CNNs use batch normalization?
Sample answer preview
Normalization techniques stabilize and accelerate deep network training by controlling the distribution of activations. Batch normalization and layer normalization both normalize activations but differ in which dimensions they normalize over, leading to different properties and…
batch normalizationlayer normalizationinternal covariate shiftrunning averagesgroup normalizationpre-layer normalization