Responsibilities
• Build and optimize ETL/ELT processes leveraging Databricks' native capabilities to handle large volumes of structured and unstructured data from various sources
• Implement data quality frameworks and monitoring solutions using Databricks data quality features to ensure data accuracy and reliability across all data products
• Establish best practices for data governance, security, and compliance within the Databricks ecosystem and integrate with enterprise systems
• Monitor and maintain production data pipelines to ensure 99.9% uptime and optimal performance across all Databricks workloads and clusters
• Implement comprehensive logging, alerting, and monitoring systems using Databricks monitoring capabilities and integration with enterprise monitoring tools
• Perform regular health checks on Databricks cluster performance, job execution times, and resource utilization to identify and resolve bottlenecks proactively
• Manage incident response procedures for Databricks pipeline failures, including root cause analysis, resolution, and post-incident reviews
• Establish and maintain disaster recovery procedures and backup strategies for critical data assets within the Databricks environment
• Conduct regular performance tuning of Spark jobs and Databricks cluster configurations to optimize cost and execution efficiency
• Implement automated testing frameworks for Databricks-based data pipelines, including unit tests, integration tests, and data validation checks
• Maintain comprehensive documentation for all Databricks operational procedures, runbooks, and troubleshooting guides
• Coordinate scheduled maintenance windows and Databricks system upgrades with minimal business impact
• Manage user access controls, workspace configurations, and security policies within Databricks environments
• Monitor data lineage using Databricks Unity Catalog and maintain metadata management systems to support operational transparency and compliance requirements
• Establish capacity planning processes to forecast Databricks infrastructure needs and manage cloud costs effectively
• Provide technical guidance and mentorship to junior team members on Databricks best practices and data engineering principles
• Participate in on-call rotation for critical production systems with focus on Databricks platform stability
• Lead operational reviews and contribute to continuous improvement initiatives for Databricks platform reliability and efficiency
• Coordinate with infrastructure teams on Databricks cluster provisioning, network configurations, and security implementations
Requirements
• Degree in Computer Science or Computer Engineering
• Minimum 8-10 years working experience in system operations compliance and management areas
• Project hands-on experience specifically with Databricks platform (primary requirement)
• project experience in cloud operations or cloud architecture
• Must be cloud certified (AWS)
• Databricks certification (Associate or Professional level) - highly preferred
• Exposure to hospital information/clinical systems is an added advantage
• Understanding of DevOps practices and CI/CD pipelines for Databricks-based data engineering projects
• Knowledge of ITIL frameworks and operational best practices
• Expert-level proficiency in Databricks platform, including workspace management, cluster configuration, and job orchestration
• Strong expertise in Apache Spark within Databricks environment, including Spark SQL, DataFrames, and RDDs
• Extensive experience with Delta Lake, including data versioning, time travel, and ACID transactions
• Proficiency in Databricks Unity Catalog for data governance and metadata management
• Good in-depth understanding of data warehouse concepts, data profiling, data verification and advanced analytics techniques
• Strong knowledge of monitoring, incident management, and cloud cost control
• Databricks (primary and most critical skill)
• AWS cloud services and architecture
• IDMC (Informatica Data Management Cloud)
• Tableau for data visualization
• Oracle Database management
• ML Ops practices within Databricks environment (Good to have)
• STATA for statistical analysis is advantage (Good to have)
• Amazon SageMaker integration with Databricks (Good to have)
• DataRobot platform integration (Good to have)
• Good interpersonal skills with the ability to work with different groups of stakeholders
• Strong problem-solving skills and ability to work independently in a fast-paced environment with minimal supervision
• Excellent communication skills for technical documentation and cross-team collaboration
Licence no: 12C6060