Question 5 of 10Pro Only

Compare batch normalization and layer normalization in deep learning. Why do Transformers typically use layer normalization while CNNs use batch normalization?

Sample answer preview

Normalization techniques stabilize and accelerate deep network training by controlling the distribution of activations. Batch normalization and layer normalization both normalize activations but differ in which dimensions they normalize over, leading to different properties and…

batch normalizationlayer normalizationinternal covariate shiftrunning averagesgroup normalizationpre-layer normalization

Unlock the full answer

Get the complete model answer, key points, common pitfalls, and access to 9+ more Data Scientist interview questions.

Upgrade to Pro

Starting at $19/month • Cancel anytime