Key Responsibilities:
- Design, deploy, and maintain HDFS clusters to support enterprise-scale data applications.
(Please include and/or provide your past experiences in this aspect in your resume).
- Monitor cluster performance, manage storage capacity, and ensure high availability and fault tolerance.
- Implement data security, access controls, and encryption for HDFS data.
- Troubleshoot and resolve issues related to HDFS, including data node failures, replication issues, and performance bottlenecks.
- Manage data ingestion pipelines and optimize data storage formats (e.g., Parquet, Avro).
- Support and work with data engineering and analytics teams to ensure reliable data delivery and transformation workflows.
- Automate cluster operations using scripting (e.g., Bash, Python) and orchestration tools.
- Conduct upgrades and patching of Hadoop ecosystem components (HDFS, YARN, Hive, etc.).
- Maintain documentation of architecture, configurations, and best practices.
- Ensure compliance with data governance and data privacy policies.
Qualifications:
Required
Bachelor’s degree in Computer Science, Information Systems, or a related field.
3–5+ years of experience with Hadoop ecosystem, particularly HDFS administration.
Strong understanding of HDFS architecture, replication, and fault tolerance.
Experience with Cloudera, Hortonworks, or Apache Hadoop distributions.
Proficiency in Linux/Unix system administration and scripting (Bash, Python, etc.).
Familiarity with related components: YARN, Hive, HBase, Spark, Oozie, and Zookeeper.
Experience with monitoring tools like Ambari, Cloudera Manager, or Nagios.
Preferred
- Hadoop certification (e.g., Cloudera Certified Administrator for Apache Hadoop - CCAH).
- Knowledge of cloud-based big data platforms (AWS EMR, Azure HDInsight, GCP Dataproc).
- Experience with containerization (Docker/Kubernetes) for big data workloads.
- Exposure to data lake architectures and data governance tools.
#J-18808-Ljbffr