Job Code - SREINF2001
Senior Site Reliability Engineer (SRE)
We are seeking an experienced Senior Site Reliability Engineer (SRE) with a strong technical background in cloud infrastructure, platform reliability, and DevOps practices to build and maintain scalable, secure, and observable systems. The ideal candidate should have deep expertise in AWS (EKS, IAM, VPC), Terraform, Kubernetes, Datadog, and CI/CD pipelines using GitHub Actions. Experience with Infrastructure as Code, Python automation, and compliance frameworks such as HIPAA, HITRUST, and SOC 2 is highly desirable. You will play a key role in driving platform reliability, accelerating delivery through automation, and fostering a blameless engineering culture.
Duration: Full-Time | Location: Remote – United States | Experience Level: 4+ years Eligibility: Open exclusively to U.S. Citizens only. No visa sponsorship available. No Green Card holders.
Responsibilities
- Design, scale, and maintain resilient cloud-native infrastructure on AWS, with a strong focus on Amazon EKS, IAM, RBAC, and security-first architecture principles.
- Build, enhance, and manage CI/CD pipelines using GitHub Actions and GitHub Advanced Security to enable faster, secure, and reliable software delivery.
- Own and continuously improve platform observability through Datadog, including metrics, logging, tracing, dashboards, and alerting.
- Develop and maintain Infrastructure as Code (IaC) solutions using Terraform and Terragrunt to ensure consistency, scalability, and automation.
- Create internal tools and automation scripts in Python to streamline operational processes and reduce manual effort.
- Maintain comprehensive technical documentation, including runbooks, standards, and operational procedures, to promote knowledge sharing and system reliability.
- Collaborate effectively within Agile teams, leveraging Jira for transparent planning, prioritization, and progress tracking.
- Participate in on-call rotations, incident response, post-incident reviews, and continuous improvement initiatives while fostering a blameless and collaborative engineering culture.
Requirements
- 4+ years of experience in a Senior SRE or DevOps role supporting large-scale production cloud environments.
- Open to U.S. Citizens only (no visa sponsorship and No Green Card holders)
- Strong expertise in AWS services, including IAM, EKS, VPC, EC2, Secrets Manager, and serverless technologies, with a solid understanding of RBAC principles.
- Hands-on experience with Infrastructure as Code and container technologies, including Terraform, Terragrunt, Helm, and Kubernetes.
- Proven ability to design and maintain CI/CD pipelines using GitHub Actions, along with experience leveraging GitHub Advanced Security capabilities such as secret scanning and policy enforcement.
- Deep knowledge of Datadog, including dashboard creation, monitor configuration, alert tuning, and telemetry analysis.
- Proficiency in Python for developing automation scripts and internal tooling.
- A strong commitment to creating and maintaining clear, comprehensive documentation as an essential part of engineering excellence.
- Experience working within Agile/Scrum teams and using Jira to manage work, priorities, and delivery.
- Practical understanding of infrastructure optimization, capacity planning, and resource utilization.
Preferred Skills
- AWS Certified DevOps Engineer Professional certification.
- Experience with AWS Lambda, AWS Fargate, and serverless architectures
- Familiarity with multi-tenant platforms and customer-isolated deployment models.
- Knowledge of security and compliance frameworks such as HIPAA, HITRUST, and SOC 2.