The IT (Compute Infrastructure) Engineer is responsible for ensuring that the servers, compute clusters, and supporting systems of our global data platform remain fully operational and optimized for performance.
This role focuses on monitoring infrastructure health, performing maintenance, and addressing technical issues in a timely manner to support uninterrupted global data curation, analysis, and sharing.
The candidate will work closely with system administrators, developers, and scientists to ensure high availability and reliability of compute resources required for large-scale genomic data processing and visualization tools.
Job Responsibilities:
Monitor and manage the performance, availability, and capacity of servers, compute clusters, and related systems.Conduct regular system maintenance, including updates, patches, upgrades, and backups, to ensure security and stability.Troubleshoot and resolve hardware, network, and software issues affecting compute infrastructure.Perform repair and replacement of faulty components in servers, storage systems, or networking equipment as needed.Collaborate with technical teams to optimize system configurations for large-scale genomic data workflows and analysis.Implement monitoring solutions and alerts to proactively identify and address infrastructure bottlenecks or failures.Support disaster recovery planning and execution, ensuring minimal downtime during incidents.Maintain system documentation, including configuration records, maintenance logs, and technical manuals.Ensure compliance with institutional IT security and data protection policies.Contribute to scaling infrastructure in alignment with growing global user demands and evolving scientific needs.Job Requirements:
Bachelor degree in Information Technology, Computer Engineering, Systems Administration, or a related field.Strong experience in server administration (Linux/Unix-based environments preferred).Hands-on experience with monitoring tools, system diagnostics, and performance tuning.Knowledge of networking protocols, storage systems, and virtualization technologies.Familiarity with high-performance computing (HPC), cluster management, or cloud infrastructure is an advantage.Candidates with software engineering experience will have an advantage.Ability to conduct hardware diagnostics, maintenance, and replacement procedures.Understanding of security best practices for compute infrastructure.Strong troubleshooting skills and ability to work under pressure in critical situations.Effective communication skills and ability to work collaboratively with multidisciplinary teams.Motivation to support global health research by ensuring reliable compute infrastructure.