Project Description
We are seeking a skilled ML Platform Engineer, responsible for automating, deploying, patching, and maintaining our machine learning platform infrastructure.
You need to have hands-on experience with Cloudera Data Science Workbench (CDSW), Cloudera Data Platform (CDP), Docker, Kubernetes, Python, Ansible, GitLab, and MLOps best practices.
Responsibilities
- Automate deployment and management processes for machine learning platforms using tools such as Ansible and Python.
- Deploy, monitor, and patch ML platform components, including CDSW, Docker containers, and Kubernetes clusters.
- Ensure high availability and reliability of ML infrastructure through proactive maintenance and regular updates.
- Develop and maintain comprehensive documentation for platform configurations, processes, and procedures.
- Troubleshoot and resolve platform issues, ensuring minimal downtime and optimal performance.
- Implement best practices for security, scalability, and automation within the ML platform ecosystem.
Mandatory Skills
- DevOps / Platform Engineers with Cloudera or Azure, along with Python and ML.
- Hands-on experience with CDSW (Cloudera Data Science Workbench) or similar ML/AI platforms.
- Strong expertise in containerization and orchestration using Docker and Kubernetes (AKS preferred).
- Proficiency in Python programming (enterprise-level applications, automation, and scripting).
- Experience with Ansible for infrastructure as code (IaC), deployment automation, and configuration management.
- Strong knowledge of Unix/Linux systems (administration, troubleshooting, performance tuning).
- Practical experience with GitLab for source control and CI/CD pipeline automation.
- Deep understanding of MLOps principles and best practices (deployment, monitoring, lifecycle management of ML workloads).
- Experience in designing, developing, and maintaining distributed systems and services.
- Proven ability in patching, updating, and maintaining platform infrastructure.
Nice-to-Have Skills
- Familiarity with Cloudera CDP ecosystem (beyond CDSW).
- Knowledge of monitoring & observability tools (Prometheus, Grafana, ELK).
- Exposure to Airflow, MLflow, or Kubeflow for workflow and ML lifecycle orchestration.
- Cloud platform experience with Azure (AKS, networking, storage, monitoring).
Seniority level
Employment type
Job function
- Engineering, Project Management, and Analyst
Industries
- Banking, Investment Banking, and Software Development
#J-18808-Ljbffr