Question 4 of 10Pro Only
Explain the differences between self-attention, cross-attention, and multi-head attention. How do these mechanisms work together in models like BERT and GPT?
Sample answer preview
Attention mechanisms are the computational foundation of modern Transformers, with different variants serving distinct purposes in model architectures. Understanding these mechanisms and how they combine is essential for working with and adapting state-of-the-art models.
self-attentioncross-attentionmulti-head attentionBERTGPTcausal masking