Question 10 of 10Pro Only
You need to process 5 TB of daily log data to generate analytics dashboards. Design a Spark-based pipeline, explaining your architectural decisions and optimization strategies.
Sample answer preview
Processing 5 TB of daily log data requires careful architectural decisions to ensure reliability, performance, and cost efficiency. Here is how I would design this pipeline. Data ingestion design depends on how logs arrive.
SparkParquetpartitioningbroadcast joinAQEdynamic allocation