Data Engineer
IntermediateDesign scalable data pipelines, implement data lakes, and optimize big data processing.
Your Progress0 / 50 questions
2 questions free per topic
Unlock all 50 questions with Pro
Topics
1Data Warehousing
2 free / 10 questions
1
Data Warehousing
2 free / 10 questions
- 1What is a data warehouse, and how does it differ from a transactional database?
- 2What is the difference between a star schema and a snowflake schema? When would you choose one over the other?
- Explain the concepts of fact tables and dimension tables in dimensional modeling. What are the different types of fact tables?Pro
- How do you handle slowly changing dimensions in a data warehouse? Describe the most common SCD types and their trade-offs.Pro
- Describe the typical layers in a data warehouse architecture. What is the purpose of each layer?Pro
- Compare the Kimball and Inmon approaches to data warehouse design. What are the advantages and disadvantages of each methodology?Pro
- How does partitioning improve data warehouse performance? What partitioning strategies would you recommend, and what are the pitfalls?Pro
- Compare modern cloud data warehouse platforms like Snowflake, BigQuery, and Redshift. What are the architectural differences and when would you recommend each?Pro
- What is Data Vault modeling, and how does it differ from traditional dimensional modeling? When would you choose it?Pro
- You are tasked with designing a data warehouse for an e-commerce company from scratch. Walk through your approach, from gathering requirements to delivering value.Pro
Unlock 8 more questions
Get full access with Pro
2Big Data (Spark, Hadoop)
2 free / 10 questions
2
Big Data (Spark, Hadoop)
2 free / 10 questions
- 1What is big data, and what are the key characteristics that distinguish it from traditional data processing?
- 2Describe the Hadoop ecosystem. What are its core components, and how do they work together?
- How does Apache Spark differ from Hadoop MapReduce? Why has Spark become the preferred processing framework?Pro
- Explain the differences between RDDs, DataFrames, and Datasets in Apache Spark. When would you use each?Pro
- What is lazy evaluation in Spark, and how does the DAG execution model optimize job execution?Pro
- Explain data partitioning and shuffling in Spark. Why are shuffles expensive, and how can you minimize them?Pro
- What is data skew in distributed processing, and how do you handle it in Spark?Pro
- How does Spark manage memory on executors? What happens when a Spark job runs out of memory, and how do you troubleshoot it?Pro
- When and how should you use caching and persistence in Spark? What are the different storage levels, and what are the trade-offs?Pro
- You need to process 5 TB of daily log data to generate analytics dashboards. Design a Spark-based pipeline, explaining your architectural decisions and optimization strategies.Pro
Unlock 8 more questions
Get full access with Pro
3Cloud Data Services
2 free / 10 questions
3
Cloud Data Services
2 free / 10 questions
- 1What are the main categories of cloud data services that a data engineer works with? Give examples from at least two cloud providers.
- 2What is a data lake, and how do you build one using cloud services? How does it differ from a data warehouse?
- What is the difference between serverless and provisioned cloud data services? When would you choose each approach for data engineering workloads?Pro
- Compare AWS Glue, Google Cloud Dataflow, and Azure Data Factory. What are their key differences and strengths for data engineering?Pro
- How do you optimize cloud storage costs and performance for data engineering workloads? What strategies apply across providers?Pro
- Design a cloud-native data pipeline architecture for ingesting data from multiple sources into a data lake and warehouse. What services would you use and why?Pro
- Why is infrastructure as code important for data engineering, and how do you apply it to cloud data services?Pro
- How do you secure data in cloud data engineering environments? What are the key security layers and best practices?Pro
- How do you optimize costs for cloud data engineering workloads? What are the biggest cost drivers, and how do you control them?Pro
- When would a data engineering team choose a multi-cloud or hybrid cloud strategy? What are the challenges and how do you address them?Pro
Unlock 8 more questions
Get full access with Pro
4Data Modeling
2 free / 10 questions
4
Data Modeling
2 free / 10 questions
- 1What is data modeling, and why is it important in data engineering?
- 2Explain the differences between conceptual, logical, and physical data models. What does each level capture, and who is the audience for each?
- Explain database normalization. What are the first three normal forms, and why does normalization matter?Pro
- When would you choose to denormalize a database? What are the benefits and risks of denormalization?Pro
- What is an Entity-Relationship diagram? How do you represent entities, attributes, and different types of relationships?Pro
- What is the difference between surrogate keys and natural keys? When would you use each, and what are the implications for data modeling?Pro
- How do you model many-to-many relationships in relational databases? What challenges arise, and how do you handle them in analytical models?Pro
- How do you model time-series data in a database? What design patterns and considerations apply?Pro
- How do you model semi-structured data like JSON or XML in a relational or analytical database? What are the trade-offs between different approaches?Pro
- A ride-sharing company needs a data model to support both operational systems and analytics. Walk through how you would design this model, covering the key entities, relationships, and trade-offs.Pro
Unlock 8 more questions
Get full access with Pro
5Real-time Streaming
2 free / 10 questions
5
Real-time Streaming
2 free / 10 questions
- 1What is stream processing, and what are common use cases where real-time data processing is essential?
- 2Explain the architecture of Apache Kafka. What are topics, partitions, brokers, producers, and consumers?
- What are the different message delivery semantics in streaming systems? Explain at-most-once, at-least-once, and exactly-once processing.Pro
- What are windows in stream processing? Describe the different types of windows and when you would use each.Pro
- What are watermarks in stream processing, and how do they help handle out-of-order and late-arriving data?Pro
- What happens during a Kafka consumer group rebalance? What causes rebalances, and how do you minimize their impact?Pro
- What are stateful operations in stream processing? How do frameworks like Flink and Spark manage state, and what challenges does state management introduce?Pro
- How do you handle schema evolution in streaming data pipelines? What happens when producers and consumers have different schema versions?Pro
- What is backpressure in stream processing, and how do you detect and handle it?Pro
- Design a real-time streaming pipeline for a ride-sharing company that needs to detect surge pricing conditions and alert drivers within seconds. Walk through your architecture and design decisions.Pro
Unlock 8 more questions
Get full access with Pro
Mock Interview
Test your knowledge with an AI-powered mock interview session.
Start Mock InterviewText
Voice (Pro)
Quick Stats
- Total Questions50
- Topics5
- DifficultyIntermediate