Job description
Platform Design & Deployment
Work closely with CEP customer on technical requirement gathering and
Architect and implement production-grade OpenShift clusters on OpenStack, including control plane, compute nodes, storage integrations, and networking.
Adapt typical OpenShift and OpenStack design into government security and governance compliance construct.
Provide deep technical advisory and design decision rationales to internal and external stakeholders.
Define and automate infrastructure provisioning (IaaC) using tools such as Terraform, Ansible, or Red Hat Ansible Tower.
Operational Excellence
Develop and maintain monitoring, alerting, and logging pipelines (Prometheus, Grafana, EFK/ELK, Alertmanager).
Lead capacity planning, performance tuning, and day-to-day cluster health management.
Implement robust backup, disaster recovery, and upgrade strategies.
Automation & CI/CD
Build and manage CI/CD pipelines (Jenkins, GitLab CI, Argo CD) for platform updates, operator deployments, and application rollouts.
Author scripts and operators to automate routine maintenance, scaling, and self-healing tasks.
Security & Compliance
Enforce security best practices: RBAC, network policies, SELinux, secrets management (Vault, OpenShift Secrets).
Collaborate with security teams to implement vulnerability scanning, baseline hardening, and compliance audits.
Collaboration & Documentation
Partner with development, QA, and networking teams to onboard new applications and troubleshoot platform issues.
Produce runbooks, run-charts, design docs, and knowledge-base articles.
Experience
5+ years in Linux system administration (RHEL) and virtualization (KVM/QEMU).
Experience in VMware would be added advantage.
3+ years deploying and operating OpenShift in production environments.
Strong understanding about network and storage virtualisation.
Hands-on experience with OpenStack (Ansible-based or OpenStack SDK): Nova, Neutron, Cinder, Keystone, Glance.
Understand about basic infrastructure security and policies in government will be added advantage.
Technical Skills
Infrastructure as Code: Terraform, Ansible, or equivalent.
Physical, virtual and container-based networking & storage: Calico, OVN, Ceph, Portworx.
Monitoring/Logging: Prometheus, Grafana, ELK/EFK stacks.
Scripting: Bash, Python, or Go.
Networking fundamentals: VLANs, SDN, L3 routing, load balancing (HAProxy, OVN LB).
Soft Skills
Strong problem-solving and troubleshooting aptitude in complex distributed systems.
Excellent verbal and written communication; able to produce clear operational documentation.
Proactive, self-driven, and comfortable leading cross-functional initiatives.
Preferred Qualifications
Red Hat Certified Specialist in OpenShift Administration or OpenStack (RHOS-CL310).
Familiar with VMware stacks
Experience with GitOps tools (Argo CD, Flux).
Familiarity with service mesh (Istio, OpenShift Service Mesh) and serverless frameworks.
Exposure to hybrid-cloud or multi-cloud OpenShift deployments.
Required Skill Profession
Other General