Key Responsibilities:
Design, deploy, and maintain HDFS clusters to support enterprise-scale data applications.
(Please include and/or provide your past experiences in this aspect in your resume).
Monitor cluster performance, manage storage capacity, and ensure high availability and fault tolerance.
Implement data security, access controls, and encryption for HDFS data.
Troubleshoot and resolve issues related to HDFS, including data node failures, replication issues, and performance bottlenecks.
Manage data ingestion pipelines and optimize data storage formats (e.g., Parquet, Avro).
Support and work with data engineering and analytics teams to ensure reliable data delivery and transformation workflows.
Automate cluster operations using scripting (e.g., Bash, Python) and orchestration tools.
Conduct upgrades and patching of Hadoop ecosystem components (HDFS, YARN, Hive, etc.).
Maintain documentation of architecture, configurations, and best practices.
Ensure compliance with data governance and data privacy policies.
Qualifications:
Required
Bachelor’s degree in Computer Science, Information Systems, or a related field.
3–5+ years of experience with Hadoop ecosystem, particularly HDFS administration.
Strong understanding of HDFS architecture, replication, and fault tolerance.
Experience with Cloudera, Hortonworks, or Apache Hadoop distributions.
Proficiency in Linux/Unix system administration and scripting (Bash, Python, etc.).
Familiarity with related components: YARN, Hive, HBase, Spark, Oozie, and Zookeeper.
Experience with monitoring tools like Ambari, Cloudera Manager, or Nagios.
Preferred
Hadoop certification (e.g., Cloudera Certified Administrator for Apache Hadoop - CCAH).
Knowledge of cloud-based big data platforms (AWS EMR, Azure HDInsight, GCP Dataproc).
Experience with containerization (Docker/Kubernetes) for big data workloads.
Exposure to data lake architectures and data governance tools.
#J-18808-Ljbffr