We are hiring Data Engineering Technologist – Large-Scale Distributed Systems with below requirements;
Responsibilities
- Design and maintain ETL/ELT pipelines using Spark for batch and streaming data.
- Manage and optimize Hadoop clusters (HDFS, YARN) for scalability and reliability.
- Build and maintain Hive data models, partitions, and queries for analytics and reporting.
- Improve query and pipeline performance through tuning, partitioning, bucketing, and caching.
- Ensure data quality, governance, and security across the big data ecosystem.
- Collaborate with data scientists, analysts, and architects to support advanced analytics and BI.
Skills
- Experience in data engineering or big data development.
- Strong hands-on experience in Spark (Core, SQL, Streaming).
- Good understanding of Hadoop (HDFS, YARN) architecture.
- Expertise in Hive data modeling, query optimization, and performance tuning.
- Experience with cluster troubleshooting, monitoring, and scaling.
- Knowledge of data governance and security frameworks is a plus.
- Familiarity with cloud big data platforms (AWS EMR, Azure HDInsight, or GCP Dataproc) preferred.