Job Title: Site Reliability Engineer (SRE)
Experience: 8+ years (including 3+ years in Java)
About the Role: We’re looking for a skilled Site Reliability Engineer with strong Java and cloud-native development experience to design, build, and maintain reliable, scalable systems on Kubernetes and AWS.
You’ll work closely with development and platform teams to drive automation, observability, and operational excellence.
Key Responsibilities
- Develop and deploy Java/Spring Boot microservices on Kubernetes (EKS, AKS, OCP).
- Build observability and monitoring using ELK, Prometheus, Grafana, CloudWatch, Jaeger, and OpenTelemetry.
- Improve reliability, scalability, and performance across distributed systems.
- Support production systems and participate in on-call rotations.
- Automate infrastructure and CI/CD pipelines (Terraform, Ansible, etc.).
Required Skills
- 8+ years in software development, 3+ in Java (Spring Boot).
- Experience with Kubernetes, Linux, networking, and distributed systems.
- Hands-on with observability and monitoring tools.
- Experience with ArgoCD, Go/Python/Groovy, and Java performance tuning.
- Exposure to Terraform, Crossplane, API gateways (Apigee, Kong, Istio), and configuration tools (Ansible, Puppet).
- Familiarity with APM tools (Dynatrace, AppDynamics) and databases (PostgreSQL, MongoDB).
- Relevant certifications (AWS, Kubernetes, Java, Linux) preferred.
Seniority level
Associate
Employment type
Full-time
Job function
Information Technology
We’re unlocking community knowledge in a new way.
Experts add insights directly into each article, started with the help of AI.
We are an equal opportunity employer.
We do not discriminate on the basis of race, color, religion, national origin, gender, sexual orientation, age, or disability.
#J-18808-Ljbffr