smartrecruiters

Senior Site Reliability Engineer, Temporal Platform @ Grab

HCMC, vnOnsiteFull-timePosted 64 days ago

Opens on smartrecruiters

About this role

Get to know our Team:

Temporal is an open source distributed system used to schedule and run asynchronous tasks within Grab. Temporal cluster will help the execution of each workflow amongst a pool of self-hosted workers that make queries and execute code in your VPC. The Temporal Platform team will maintain the Temporal infrastructure, ensure the system run in a stable and and reliable ways.

Get to know the role:

As a senior engineer, you would be responsible for designing, developing, and maintaining key aspects of the Temporal infrastructure. You are expected for the day-to-day operation of the platform and be involved in tasks like optimizing the scheduling and execution of workflows, ensuring the reliability of the system, and developing new features or tools to improve the Temporal platform.

The day-to-day activities:

Designing, deploying and operating large scale workflow orchestration infrastructure in the public cloud (AWS).

Investigating, mitigating and resolving incidents such as production outages, performance degradation or data loss. Authoring post-mortems, identifying and implementing longer-run corrective actions and missing guardrails.

Running upgrades (EKS, Operating System…) as well as routine maintenance operations.

Coaching and mentoring the team's Junior Engineers. Giving them the technical guidance they need and actively contributing to their growth.

Bar raiser for design reviews, RFCs and post mortems, making them best in class in terms of clarity, accuracy and comprehensiveness.

Supporting our users in their needs of onboarding their use cases to the platform, always with a view to self-service and automation where possible.

Requirement:

Minimum of 3 years of experience in DevOps and SRE

Familiar with AWS Cloud services and AWS cloud architecture.

Experienced with maintaining and operating distributed software such as ScyllaDB, Cassandra, Kafka, ...

Excellent problem-solving, analytical skills.

Good verbal English communication.

Proven ability to work both independently and as part of a collaborative team.

Tech stacks: Golang, Terraform, Kubernetes, AWS

What Essential Skills You Will Need

Minimum of 3 years of experience in DevOps and SRE

Familiar with AWS Cloud services and AWS cloud architecture.

Experienced with maintaining and operating distributed software such as ScyllaDB, Cassandra, Kafka, ...

Excellent problem-solving, analytical skills.

Good verbal English communication.

Proven ability to work both independently and as part of a collaborative team.

Life at Grab

We care about your well-being at Grab, here are some of the global benefits we offer:

We have your back with Term Life Insurance and comprehensive Medical Insurance.With GrabFlex, create a benefits package that suits your needs and aspirations.Celebrate moments that matter in life with loved ones through Parental and Birthday leave, and give back to your communities through Love-all-Serve-all (LASA) volunteering leaveWe have a confidential Grabber Assistance Programme to guide and uplift you and your loved ones through life's challenges.Balancing personal commitments and life's demands are made easier with our FlexWork arrangements such as differentiated hoursWhat We Stand For at Grab

We are committed to building an inclusive and equitable workplace that enables diverse Grabbers to grow and perform at their best. As an equal opportunity employer, we consider all candidates fairly and equally regardless of nationality, ethnicity, religion, age, gender identity, sexual orientation, family commitments, physical and mental impairments or disabilities, and other attributes that make them unique.

Skills

EngineeringAssociateInternet

Ready to apply?

Install the ResuMinder extension and we'll auto-fill the application in seconds — no rewriting.