smartrecruiters

Sr Site Reliability Engineer @ Visa

São Paulo, SP, brRemoteFull-timePosted 79 days ago

Opens on smartrecruiters

About this role

Join Pismo’s Platform squad within the SRE Tribe, dedicated to owning and evolving the containerized platform that underpins critical workloads. You’ll work cross‑functionally to ensure our platform is reliable, scalable, secure, and easy to operate, focusing on Kubernetes at scale and cloud architecture.

What You’ll Do

Own the end‑to‑end lifecycle (design, provisioning, upgrades, maintenance, and decommissioning) of core platform components, including:

Cloud infrastructure primitivesKubernetes clusters and cluster servicesNetworking, ingress, and service discoveryService Mesh and supporting data‑plane componentsDesign platform components to be resilient by default, applying SRE principles such as:

Fault isolation and graceful degradationCapacity planning and saturation controlReduced operational toil and clear failure modesLead the design and implementation of infrastructure bootstrap orchestration, including:

Automated cluster and environment provisioningDeterministic, repeatable platform bring‑up and teardownDependency‑aware orchestration across cloud, network, and Kubernetes layersDrive Infrastructure‑as‑Code and GitOps‑first practices to ensure:

Platform components are reproducible and auditableChanges are automated, testable, and reversibleManual intervention is minimized or eliminatedIdentify automation gaps and lead initiatives that reduce human effort, onboarding time, and operational risk.Apply and promote SRE operational excellence practices, including:

Clear ownership and runbooks for platform componentsParticipation in on‑call rotation as a platform reliability escalation pointIncident response, post‑incident reviews, and problem managementImprove day‑2 operations by standardizing upgrade/rollback strategies and reducing MTTD/MTTR.Ensure platform operations align with security, compliance, and internal control requirements.Collaborate with engineering teams across the organization to influence platform adoption, reliability standards, and cloud‑native best practices.This is a remote position. A remote position does not require job duties be performed within proximity of a Visa office location. Remote positions may be required to be present at a Visa office with scheduled notice. #LI-Remote

For this role, you must be based in Brazil.

Language Skills

Proficiency in English at B2 level or above (Upper-Intermediate)

Technical Skills

Strong hands‑on experience with public cloud platforms (AWS preferred, Azure also considered).Proven experience operating and administering Kubernetes at scale in production environments.Strong experience with container orchestration platforms and cloud architecture fundamentals (networking, IAM/security concepts, and reliability patterns).Experience with Infrastructure as Code (Terraform preferred) and automation‑first workflows.Familiarity with GitOps practices and CI/CD pipelines.Strong troubleshooting skills for distributed systems, including root‑cause analysis and reliability improvements.Experience with observability concepts and practices (monitoring, logging, alerting, tracing).Preferred Qualifications

Experience with Service Mesh technologies (Istio preferred, App Mesh or Linkerd).Experience working with critical or mission‑critical systems.Strong background applying SRE principles (operational readiness, incident management, runbooks, toil reduction).AWS certifications. Visa is an EEO Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability or protected veteran status. Visa will also consider for employment qualified applicants with criminal histories in a manner consistent with EEOC guidelines and applicable local law.

Skills

Software Development/EngineeringEngineeringMid-Senior LevelInformation Technology And Services

Ready to apply?

Install the ResuMinder extension and we'll auto-fill the application in seconds — no rewriting.