Question 7 of 10Pro Only
What is data skew in distributed processing, and how do you handle it in Spark?
Sample answer preview
Data skew occurs when data is unevenly distributed across partitions, causing some tasks to process significantly more data than others. Since a Spark stage completes only when its slowest task finishes, a single skewed partition can bottleneck the entire job while other…
data skewsaltingbroadcast joinAQEAdaptive Query Executionrepartition