📐 About this role

We are looking for a foundational member of the Cloud infrastructure team at WRITER. This role will involve contributing to the development and implementation of our Site reliability engineering (SRE) program. The ideal candidate will ensure the reliability, scalability, performance, and security of WRITER’s critical systems, taking a proactive approach to guarantee that our high-ROI products reach our customers seamlessly.

🦸🏻‍♀️ Your responsibilities:

Lead the design, implementation, and maintenance of WRITER, Inc.’s cloud infrastructure to ensure high availability and performance

Design and implement scalable cloud automation to support seamless deployment for our largest enterprise customers

Automate infrastructure provisioning and management using Terraform & Python

Collaborate with development teams to optimize cloud resources and enhance system reliability

Develop and maintain monitoring and alerting systems to proactively identify and resolve issues affecting the reliability of our writing solutions

Conduct post-mortem analyses of system failures to identify root causes and implement preventive measures

Optimize and scale our cloud infrastructure to support growing user demand and ensure cost efficiency

Ensure the security and compliance of our systems, adhering to industry standards and regulations

Provide mentorship and technical guidance to junior engineers, fostering a culture of reliability and continuous improvement

Stay current with emerging technologies and industry trends to continuously improve our site reliability practices

⭐ Is this you?

Proven expertise in Site Reliability Engineering with a minimum of 7 years of hands-on experience

Deep understanding of system architecture and infrastructure design to ensure high availability and performance

Bachelor’s degree in Computer Science, Engineering, or a related technical field

Strong proficiency in programming languages such as Python, Java, Go for automation and monitoring

Experience with cloud platforms like AWS, Azure, or GCP, and their respective services for scalable and resilient systems

Expertise in containerization technologies (e.g., Docker, Kubernetes) and orchestration tools

Knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack) to maintain system health and performance

Ability to lead and mentor junior engineers in best practices for reliability and system optimization

Excellent communication skills to collaborate effectively with cross-functional teams and stakeholders

Proactive approach to identifying and mitigating potential system failures and performance bottlenecks

✨Preferred skills & experience:Software engineering expertise

Terraform

Python

Kubernetes

Scala

AWS/GCP

🍩 Benefits & perks (UK full-time employees):

Generous PTO, plus company holidays

Comprehensive medical and dental insurance

Paid parental leave for all parents (12 weeks)

Fertility and family planning support

Early-detection cancer testing through Galleri

Competitive pension scheme and company contribution

Annual work-life stipends for:

Home office setup, cell phone, internet

Wellness stipend for gym, massage/chiropractor, personal training, etc.

Learning and development stipend

Company-wide off-sites and team off-sites

Competitive compensation and company stock options

Site reliability engineer @ Writer

About this role

Skills

Ready to apply?