Director of Site Reliability Engineering
As the Director of Site Reliability, you will be responsible for ensuring the overall performance, availability and resilience of our infrastructure. This includes implementing and maintaining monitoring systems,collaborating with cross-functional teams to address performance bottlenecks and continuously improving the reliability and scalability of our systems to meet the evolving needs of our users.
How You’ll have impact:
Reporting into the global Head of Infrastructure, your peers and customers will be all of the other engineering directors at Reddit. You will partner with a multitude of stakeholders to understand Reddit’s core service priorities across all of our product lines, and will guide design, development, and adoption of a scalable, reliable, and low latency core service stack. This stack will operate in multiple cloud environments, and provide an API platform for Reddit to rapidly deliver reliable, performant, and efficient services to our end users.
You will be accountable for building, growing, and mentoring a world-class team of engineers to help Reddit reach its goal of bringing community and belonging to everyone.
What You’ll Do
- Develop, drive and execute a long term vision and strategy for Site Reliability to be leveraged by all of Reddit’s products.
- Support multiple Reddit product teams with expertise and engineering development to optimize availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning.
- Establish a stronger Platform-Product interface for feature tracking and prioritization, and provide an opinionated and trusted voice for guiding these decisions.
- Coordinate across product and engineering teams to understand and widely socialize Reddit’s SRE priorities across all of our products.
- Support the reliable operation of these systems as a Platform for Reddit products, and allow us to rapidly deliver reliable, performant, and efficient services to our end users.
- Evolve our backend tech stack using modern and internal supported options (Golang, Redis, etc)
- Lead, manage and grow high-caliber engineering teams.
- Provide mentorship and growth opportunities for team members and leaders to evolve in their roles at a company scale.
- Set and support a culture of metrics driven Quality, with efficient processes and strong transparency.
- Drive a cycle of virtuous improvement with blame-free postmortems.
What We Look For
- 6+ years experience of managing teams of site reliability and infrastructure engineers.
- 10+ years of experience developing internet-scale software, preferably in infrastructure roles.
- Experience designing, deploying, building or managing distributed systems of significant scale
- Professional experience and capability with essential cloud infrastructure systems (Kuberrnetes, AWS, GCE).
- Track record of assembling high functioning organization
- Strong organizational skills, the ability to prioritize tasks and keep projects on schedule.
- BS degree in Computer Science, similar technical field of study or equivalent practical experience.
- Private Medical, Dental and Vision Benefits
- Retirement Savings plan with matching contributions
- Workspace benefits for your home office
- Personal & Professional development funds
- Family Planning Support
- Commuter Benefits
- Flexible Vacation & Reddit Global Days Off