Responsibilities
- Integrate data from multiple sources, such as databases, APIs, or streaming platforms, to provide a unified view of the data
- Implement data quality checks and validation processes to ensure the accuracy, completeness, and consistency of data
- Identify and resolve data quality issues, monitor data pipelines for errors, and implement data governance and data quality frameworks
- Enforce data security and compliance with relevant regulations and industry-specific standards
- Implement data access controls, encryption mechanisms, and monitor data privacy and security risks
- Optimise data processing and query performance by tuning database configurations, implementing indexing strategies, and leveraging distributed computing frameworks
- Optimize data structures for efficient querying and develop data dictionaries and metadata repositories
- Identify and resolve performance bottlenecks in data pipelines and systems
- Collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders
- Document data pipelines, data schemas, and system configurations, making it easier for others to understand and work with the data infrastructure
- Monitor data pipelines, databases, and data infrastructure for errors, performance issues, and system failures
- Set up monitoring tools, alerts, and logging mechanisms to proactively identify and resolve issues to ensure the availability and reliability of data
- It would be a plus if he has software engineering background
Requirements
- Bachelor's or master's degree in computer science, information technology, data engineering, or a related field
- Strong knowledge of databases, data structures, algorithms
- Proficiency in working with data engineering tools and technologies including knowledge of data integration tools (e.g., Apache Kafka, Azure IoTHub, Azure EventHub), ETL/ELT frameworks (e.g., Apache Spark, Azure Synapse), big data platforms (e.g., Apache Hadoop), and cloud platforms (e.g., Amazon Web Services, Google Cloud Platform, Microsoft Azure)
- Expertise in working with relational databases (e.g., MySQL, PostgreSQL, Azure SQL, Azure Data Explorer) and data warehousing concepts.
- Familiarity with data modeling, schema design, indexing, and optimization techniques is valuable for building efficient and scalable data systems
- Proficiency in languages such as Python, SQL, KQL, Java, and Scala
- Experience with scripting languages like Bash or PowerShell for automation and system administration tasks
- Strong knowledge of data processing frameworks like Apache Spark, Apache Flink, or Apache Beam for efficiently handling large-scale data processing and transformation tasks
- Understanding of data serialization formats (e.g., JSON, Avro, Parquet) and data serialization libraries (e.g., Apache Avro, Apache Parquet) is valuable
- Having experience in CI/CD and GitHub that demonstrates ability to work in a collaborative and iterative development environment
- Having experience in visualization tools (e.g. Power BI, Plotly, Grafana, Redash) is beneficial
Preferred Skills & Characteristics
Consistently display dynamic independent work habits, goal oriented, passionate in growth mindsets and self-motivated professional.
Self-driven and proactive in keeping up with new technologies and programming
#J-18808-Ljbffr