ashby

Senior Data Reliability Engineer @ Elliptic

London, United KingdomRemoteFull-timePosted 90 days ago

Opens on ashby

About this role

The impact you will have:

As Senior DRE, you will drive engagement with Site Reliability across the full breadth of engineering. You will hold every engineer and every team accountable in building highly-resilient, robust, reliable software. You will be part of a cross-functional, cross-discipline team of SMEs and on-callers, whose mission it is to keep our platform highly performant 24/7/365.

Responsible for a diverse suite of products, you will oversee SR of enterprise grade applications that sit on the critical path running 1000s of QPS. Elliptic is known for its extensive and reliable datasets and you will play a critical role in defining and building out a market-leading foundation for data quality and control. This means building the processes, culture, and frameworks that will power observability, quality, data lineage, and remediation to form an essential pillar of our data & intelligence platform.

What you will do:

This is a cross team role, and you will have the full support of leadership and engineering in carrying out your responsibilities - it’s not all down to you, but you will show the rest of us what good looks like.

Evangelise SRE & DRE across engineering

Lead the charge on building out a framework for data quality that will provide our customers with strong guarantees about the fidelity of our data as well support our marketing and revenue functions

SRE as a function define and own the on-call process:

Quickly establishing a strong working knowledge of our systems

Commanding incidents

Running mop-ups

Ensuring follow-up actions are completed to your schedule

Evaluating and improving our existing E2E on-call process

Take part in the on-call rotation, one week every 4–5 weeks (24x7x365 coverage)

Evaluate, manage and maintain our existing solutions for monitoring, alerting, paging, response, documentation

Report on uptime, availability, performance, etc across our product suite

Write post-mortems for both internal and external consumption

Represent our SRE & DRE function on sales calls with tier one enterprise financial institutions

Work with product, sales and customer service to define SLAs for different products and use cases

Work with internal product teams to define SLOs for internal consumption and measurement

Work with our engineering teams directly to embed DRE practices

You will be a great fit here if you:

Thrive under high pressure situations, and are able to make tough decisions quickly

Fail fast, own the failure; encourage a blame free engineering culture

Are an inspiring thought leader, and are able to take others with you on a journey

Aren’t afraid to get your hands dirty and dig into code across myriad technologies

Understand the importance of reliability in enterprise finance systems

Have strong opinions based on your experience that you evolve over time as you learn from others

Our ideal candidate has:

Proven experience at leveling up the quality and reliability of large datasets not just services and APIs

Experience leading site reliability for a high volume SaaS product

Supported distributed systems in AWS

The presence and empathy required to hold teams to account

Defined SLAs / SLOs both internal and client facing

Offered post mortems to enterprise clients (verbal and written)

Bonus Points for:

Having a genuine interest in the crypto ecosystem and being behind the mission of the company

Working knowledge of Kubernetes and the challenges presented

Job Benefits> How we work:

Hybrid working and the option to work from almost anywhere for up to 90 days per year

£500 Remote working budget to set up your home office space

> Learning & Development:

$1,000 Learning & Development budget to use on anything (agreed with your manager) that contributes to your growth and development

> Vacation/ Leave:

Holidays: 25 days of annual leave + bank holidays

An extra day for your birthday

Enhanced parental leave: we provide eligible employees, regardless of gender or whether they become a parent by birth or adoption, 16 weeks fully-paid leave.

> Benefits:

Private Health Insurance - we use Vitality!

Full access to Spill Mental Health Support

Life Assurance: we hope you will never need this - but our cover is for 4 times your salary to your beneficiaries

£100 Crypto for you!

Cycle to Work Scheme

Skills

InfrastructureEngineering

Ready to apply?

Install the ResuMinder extension and we'll auto-fill the application in seconds — no rewriting.

Get the extension →