Responsibilities:
Utilize standard monitoring and processes to assess the health of Riot’s live servicesLead the investigation and mitigation of live incidents as an Incident Commander, and participate in post-incident RCAsIdentify improvements to the team’s tools, processes, and documentation, as well as broader improvements to prevent incidents and drive down incident impact for Riot’s live servicesWrite tooling, automation, and services to implement these improvements with the support of senior engineersUse your knowledge of cloud systems to automate the management of infrastructure and develop monitoring for live gamesAbility to go on-call and handle emergencies outside normal business hours Required Qualifications:
BS in Computer Science (or equivalent experience)Hands-on scripting experience with an appropriate language (e.g. Python, Bash) and with infrastructure automation (e.g. Terraform)Experience with cloud services (e.g. AWS and its common services)Experience with technical processes such as code reviews and testingExperience debugging issues with production systemsExperience with monitoring and event management platform Desired Qualifications:
Experience programming in one of the following languages: C/C++, Java, Go, PythonKnowledge of containerization technologies (e.g. Docker, Kubernetes)Knowledge of relational databases (e.g. MySQL)Knowledge of incident management processes (e.g. ITIL)Knowledge of Site Reliability Engineering (SRE) principles and best practicesExperience deploying and operating services in a live environment For this role, you'll find success through craft expertise, a collaborative spirit, and decision-making that prioritizes the delight of players.
We will be looking at your past studies, experience, and your personal relationship with games.
If you embody player empathy and care about the experiences of players, this could be the role for you!