1.
Data Engineering & Platform Knowledge (Must)
Strong understanding of Hadoop ecosystem: HDFS, Hive, Impala, Oozie, Sqoop, Spark (on YARN).
Experience in data migration strategies (lift & shift, incremental, re-engineering pipelines).
Knowledge of Databricks architecture (Workspaces, Unity Catalog, Clusters, Delta Lake, Workflows).
2.
Testing & Validation (Preferred)
Data reconciliation (source vs.
target).
Performance benchmarking.
Automated test frameworks for ETL pipelines.
3.
Databricks-Specific Expertise (Preferred)
Delta Lake: ACID transactions, time travel, schema evolution, Z-ordering.
Unity Catalog: Catalog/schema/table design, access control, lineage, tags.
Workflows/Jobs: Orchestration, job clusters vs.
all-purpose clusters.
SQL Endpoints / Databricks SQL: Designing downstream consumption models.
Performance Tuning: Partitioning, caching, adaptive query execution (AQE), photon runtime.
4.
Migration & Data Movement (Preferred)
Data migration from HDFS/Cloudera to cloud storage (ADLS/S3/GCS).
Incremental ingestion techniques (Change Data Capture, Delta ingestion frameworks).
Mapping Hive Metastore to Unity Catalog (metastore migration).
Refactoring HiveQL/Impala SQL to Databricks SQL (syntax differences).
5.
Security & Governance (Nice to have)
Mapping Cloudera Ranger/SSO policies Unity Catalog RBAC.
Azure AD / AWS IAM integration with Databricks.
Data encryption, masking, anonymization strategies.
Service Principal setup & governance.
6.
DevOps & Automation (Nice to have)
Infrastructure as Code (Terraform for Databricks, Cloud storage, Networking).
CI/CD for Databricks (GitHub Actions, Azure DevOps, Databricks Asset Bundles).
Cluster policies & job automation.
Monitoring & logging (Databricks system tables, cloud-native monitoring).
7.
Cloud & Infra Skills (Nice to have)
o Storage (S3/ADLS/GCS).
o Networking (VNETs, Private Links, Security Groups).
o IAM & Key Management.
9.
Soft Skills
Ability to work with business stakeholders for data domain remapping.
Strong documentation and governance mindset.
Cross-team collaboration (infra, security, data, business).