Key Responsibilities
Design, build, and maintain
robust, scalable, and secure data pipelines
for batch and real-time data processing.
Develop and optimize
ETL/ELT workflows
to extract, transform, and load data from multiple sources.
Architect and implement
data warehouses, data lakes, and lakehouse solutions
on cloud or on-prem platforms.
Ensure
data quality, lineage, governance, and versioning
using metadata management tools.
Collaborate with
Data Scientists, Analysts, and Software Engineers
to deliver reliable and accessible data solutions.
Optimize
SQL queries, data models, and storage layers
for performance and cost efficiency.
Develop and maintain
automation scripts
for data ingestion, transformation, and orchestration.
Integrate and process large-scale data from APIs, flat files, streaming services, and legacy systems.
Implement
data security, access control, and compliance standards (GDPR, ISO 27001) .
Monitor and troubleshoot
data pipeline failures, latency, and performance bottlenecks .
Data Engineering & Architecture
Strong expertise in
data modeling (dimensional/star/snowflake schemas)
and
data normalization techniques .
Proficient in
ETL/ELT tools
such as
Apache NiFi, Talend, Informatica, SSIS, or Airbyte .
Advanced knowledge of
SQL and distributed computing concepts .
Experience with
data lake and warehouse technologies
such as
Snowflake, Redshift, BigQuery, Azure Synapse, or Databricks .
Deep understanding of
data partitioning, indexing, and query optimization .
Big Data & Distributed Systems
Hands-on experience with
Hadoop ecosystem (HDFS, Hive, HBase, Oozie, Sqoop) .
Proficiency in
Apache Spark / PySpark
for distributed data processing.
Exposure to
streaming frameworks
like
Kafka, Flink, or Kinesis .
Familiarity with
NoSQL databases
such as
MongoDB, Cassandra, or Elasticsearch .
Knowledge of
data versioning
and
catalog systems
(e.g., Delta Lake, Apache Hudi, Iceberg, or AWS Glue Data Catalog).
Programming & Automation
Strong programming skills in
Python ,
Scala , or
Java
for data manipulation and ETL automation.
Experience with
API integration, REST/GraphQL, and data serialization formats
(JSON, Parquet, Avro, ORC).
Proficient in
shell scripting, automation, and orchestration tools
(Apache Airflow, Prefect, or Luigi).
Cloud Platforms
Expertise in at least one cloud ecosystem:
AWS:
S3, Redshift, Glue, EMR, Lambda, Athena, Kinesis
Azure:
Data Factory, Synapse, Blob Storage, Databricks
GCP:
BigQuery, Dataflow, Pub/Sub, Cloud Composer
Strong understanding of
IAM, VPC, encryption, and data access policies
within cloud environments.
Data Governance & Security
Implement and enforce
data quality frameworks (DQ checks, profiling, validation rules) .
Knowledge of
metadata management, lineage tracking , and
master data management (MDM) .
Familiarity with
role-based access control (RBAC)
and
data encryption
mechanisms.
Preferred Skills
Experience with
machine learning data pipelines (ML Ops)
or
feature store management .
Knowledge of
containerization and orchestration tools
(Docker, Kubernetes).
Familiarity with
CI/CD pipelines
for data deployment.
Exposure to
business intelligence (BI) tools
like Power BI, Tableau, or Looker for data delivery.
Understanding of
data mesh or domain-driven data architecture principles .
Leadership & Collaboration
Work closely with
cross-functional teams
to define data requirements and best practices.
Mentor junior engineers and enforce
coding and documentation standards .
Provide technical input in
data strategy, architecture reviews, and technology evaluations .
Collaborate with
security and compliance teams
to ensure data integrity and protection.
Qualifications
Bachelor’s or Master’s Degree in Computer Science, Data Engineering, Information Systems, or related field.
5–10 years of professional experience as a Data Engineer or similar role.
Professional Certifications preferred:
AWS Certified Data Analytics / Big Data Specialty
Microsoft Certified: Azure Data Engineer Associate
Google Professional Data Engineer
Databricks Certified Data Engineer
#J-18808-Ljbffr