Responsibilities:
Leverage alerts, metrics, tools and services to identify availability and reliability issuesWork on automation to reduce toil and improve mean time to detect, respond and mitigate issuesBe on rotational on-call for live product support and operational assessmentParticipate in meaningful code review for your workProduce comprehensive user documentation around your implemented solutions Required Qualifications:
Bachelor's degree in Computer Science or a related field or relevant professional experienceAbility to work with software languages like Go, Java, Python, or JavaScript Ability to work remotely and provide on-call support Desired Qualifications:
Experience working in a Site Reliability capacityExperience with API development using RESTExperience with prioritizing and maintaining high-capacity, high-availability, and high-performant software, especially back-end servicesFamiliarity with Site Reliability best practicesExperience working in container-based ecosystems and with a container scheduler (e.g. Marathon, Mesos, Kubernetes, GKE, or Amazon ECS)Experience with distributed systems, specifically microservicesUnderstand relational databases like MySQLExperience with CI/CD pipelines, especially JenkinsUnderstand software performance and influence latency in online gamesExperience with AWS (or comparable cloud environments) Don’t forget to include a resume and cover letter.
We receive a lot of applications, but we’ll notice a fun, well-written intro that shows us you take play seriously.