SRE / Platform Engineer
BeginnerLearn Linux administration, monitoring basics, incident response fundamentals, and SLI/SLO concepts.
Your Progress0 / 50 questions
2 questions free per topic
Unlock all 50 questions with Pro
Topics
1Linux & System Administration
2 free / 10 questions
1
Linux & System Administration
2 free / 10 questions
- 1What are Linux file permissions, and how do you read and modify them using chmod?
- 2How do you view and manage running processes in Linux? Explain the key commands and signals used.
- What is package management in Linux, and what are the differences between apt and yum/dnf?Pro
- How does systemd manage services in Linux? Explain unit files, systemctl commands, and how to create a custom service.Pro
- Explain the Linux filesystem hierarchy. What are the key directories, and how do you monitor and manage disk usage?Pro
- What Linux networking commands should an SRE know for diagnosing connectivity issues? Walk through a troubleshooting scenario.Pro
- How does SSH work, and what are the best practices for secure remote server administration?Pro
- A production service is experiencing intermittent failures. How would you use Linux log analysis tools and techniques to identify the root cause?Pro
- How would you design a user and group management strategy for a fleet of Linux servers running multiple services, ensuring least-privilege access?Pro
- A server is experiencing performance degradation under load. Walk through how you would diagnose the bottleneck and tune the system using Linux tools.Pro
Unlock 8 more questions
Get full access with Pro
2Monitoring & Observability Basics
2 free / 10 questions
2
Monitoring & Observability Basics
2 free / 10 questions
- 1What is the difference between monitoring and observability, and why do SREs need both?
- 2What is Prometheus, and how does it collect and store metrics? Explain the pull-based model.
- What are the four golden signals of monitoring, and why are they important for SRE teams?Pro
- How do you build effective Grafana dashboards for an SRE team? What are the best practices for visualization and organization?Pro
- What are the best practices for structured logging, and how does centralized log management work in an SRE context?Pro
- What makes a good alert in an SRE context? How do you avoid alert fatigue while ensuring critical issues are caught?Pro
- What is distributed tracing, and how does it help SREs debug issues in microservices architectures?Pro
- How would you design and implement a complete monitoring and observability stack for a small microservices platform from scratch?Pro
- Write PromQL queries and alerting rules for monitoring a web application's reliability based on SLO targets. Explain your reasoning for each.Pro
- After a production incident that was not detected by monitoring for 30 minutes, how would you conduct a monitoring gap analysis and improve coverage?Pro
Unlock 8 more questions
Get full access with Pro
3Incident Response Fundamentals
2 free / 10 questions
3
Incident Response Fundamentals
2 free / 10 questions
- 1What is a production incident, and how do SRE teams typically classify incident severity levels?
- 2What does it mean to be on-call as an SRE, and what are the key responsibilities and best practices?
- When and how should you escalate during a production incident? What are the common escalation paths?Pro
- How should an SRE team communicate during a production incident, both internally and externally?Pro
- What is a blameless postmortem, and how do you write one that leads to meaningful improvements?Pro
- What are the key roles in an incident response team, and how does the incident command system work in an SRE context?Pro
- What is the difference between a runbook and a playbook, and how do you create effective ones for incident response?Pro
- You are on-call and receive a page at 2 AM indicating that the API error rate has jumped from 0.1% to 15%. Walk through your complete incident response process from alert to resolution.Pro
- What strategies can an SRE team use to reduce Mean Time to Detect (MTTD) and Mean Time to Recover (MTTR) for production incidents?Pro
- How would you establish an incident response program for a growing engineering team that currently has no formal process?Pro
Unlock 8 more questions
Get full access with Pro
4SLIs, SLOs & SLAs
2 free / 10 questions
4
SLIs, SLOs & SLAs
2 free / 10 questions
- 1What are SLIs, SLOs, and SLAs, and how do they relate to each other in site reliability engineering?
- 2How do you choose the right SLIs for a web service? What makes a good SLI versus a bad one?
- What is an error budget, and how does it help SRE teams make decisions about reliability versus feature development?Pro
- How do you determine the right SLO target for a service? What factors should influence whether you set 99.9% versus 99.99%?Pro
- What are the different approaches to measuring SLIs, and what are the tradeoffs between server-side metrics, synthetic monitoring, and real user monitoring?Pro
- Explain the concept of SLO compliance windows. What is the difference between a rolling window and a calendar window, and when would you use each?Pro
- What should an error budget policy include, and how do you get organizational buy-in from both engineering and product teams?Pro
- Explain multi-window, multi-burn-rate alerting for SLOs. Why is it superior to simple threshold alerting, and how would you implement it?Pro
- How do you define and implement SLOs for a microservices architecture where a single user request passes through multiple services?Pro
- Your team has been running SLOs for six months. How do you evaluate whether they are set correctly, and what is the process for adjusting them?Pro
Unlock 8 more questions
Get full access with Pro
5Basic Automation & Scripting
2 free / 10 questions
5
Basic Automation & Scripting
2 free / 10 questions
- 1What are the essential Bash scripting concepts an SRE should know, and how do you write a reliable script for production use?
- 2How do cron jobs work in Linux, and what are the best practices for scheduling and managing automated tasks?
- What is toil in the SRE context, and how do you identify and prioritize tasks for automation?Pro
- When should an SRE use Python instead of Bash for automation, and what Python libraries are most useful for SRE tasks?Pro
- What is Ansible, and how does it differ from shell scripts for managing server configuration? Explain the key concepts.Pro
- How would you design and implement automated health check scripts that verify the health of multiple services and their dependencies?Pro
- What is Infrastructure as Code, and why is it important for SRE teams? Compare declarative and imperative approaches.Pro
- How would you design a system that progressively automates runbook procedures, starting from manual steps and evolving toward full auto-remediation?Pro
- How do SREs contribute to CI/CD pipeline design to ensure deployments are safe and reliable? What automated safety checks should be built in?Pro
- How do you safely automate changes across a fleet of hundreds of servers, ensuring consistency while minimizing risk of widespread impact?Pro
Unlock 8 more questions
Get full access with Pro
Mock Interview
Test your knowledge with an AI-powered mock interview session.
Start Mock InterviewText
Voice (Pro)
Quick Stats
- Total Questions50
- Topics5
- DifficultyBeginner