Roles & Responsibilities
Select and manage on-premises technologies suitable for secure and efficient operations.
Build robust data pipelines to collect, clean, and transform diverse datasets — including process data, sensor data, image data, and human annotations.
Ensure secure, maintainable, and scalable deployment of data infrastructure.
Define and enforce best practices in data governance, privacy, and access control.
Collaborate effectively with cross-functional teams to deploy and maintain data systems.
Understand existing architecture and apply the latest technologies to enhance AI data platforms and data infrastructure.
*Educational Background*
Minimum Diploma (Polytechnic) or Bachelor's Degree in Computer Science, Engineering, or a related field.
*Technical Expertise*
Minimum 3+ years of experience in data engineering roles, ideally with on-premises or hybrid infrastructure.
Proven track record of building scalable data systems from the ground up, preferably in a startup environment.
Proficiency in Python and/or Java for data pipeline development.
Solid experience with ETL frameworks (e.g., Apache Airflow, Dagster) and streaming systems (e.g., Kafka).
Experience designing and maintaining SQL and NoSQL databases.
Hands-on experience in building and operating data lakes, data pipelines, and data warehouses, including data catalog management.
Strong understanding of modern data architecture and emerging technologies in AI data platforms.
Familiarity with containerization (Docker), version control (Git), and CI/CD practices.
*Soft Skills*
Excellent communication skills and ability to collaborate with both technical and non-technical teams.
Strong problem-solving and debugging abilities.
Ability to balance engineering trade-offs while delivering scalable and maintainable solutions.