General Job Scope
- Assist the project PI and Co-I in designing, developing, and maintaining the project's key outputs and deliverables, including the Singapore database, open-source code repositories, project website, documentation, and research papers.
- Assist in engagement with partners, government agencies, and wider academic community including at roundtables, conferences, and events
- Assist with supervising and delegating tasks to junior project team members including other Research Engineers and student assistants
Principal Accountabilities
- Design and build scalable data pipelines for extracting, validating, and structuring information from Singapore legal documents (e.g. judgments, statutes, academic publications)
- Develop automated extraction systems using rule-based methods, text processing, and machine learning/NLP techniques for information retrieval and annotation from legal texts
- Create and maintain database infrastructure including data schemas, quality validation systems, and version control for research datasets
- Build public-facing API and documentation to enable researchers and legal tech developers to access the database
- Implement data quality assurance processes including inter-annotator agreement checks, automated validation, and error correction workflows
- Collaborate with legal researchers to translate domain requirements into technical specifications and data models
- Document technical processes and decisions to support reproducibility, future maintenance, and open-source release
- Research and writing for technical papers detailing the project methodology and findings.
- Supervising and delegating tasks to junior project team members including other Research Engineers and student assistants
- Supporting the organization of and attending project-related events
Qualifications
- Bachelor's or Master's degree in Computer Science, Information Systems, Computer Engineering, Data Science, or related fields
- 2 to 3+ years' experience in data engineering or software development roles involving data pipelines, databases, and APIs
Core Qualifications
- Proficiency including experience with data processing and analysis pipelines (e.g. scrapy, pandas, nltk, langchain, transformers) and web frameworks (Rest APIs, Flask, etc)
- Proficiency including experience with database design and maintenance methods and principles (e.g. SQL, GraphQL, normalization)
- Demonstrated ability to work independently and in a team to build complete systems from requirements gathering through deployment and documentation
Preferred Qualifications
- Familiarity with LLMs and other methods for data extraction or annotation tasks
- Background in research software engineering or academic/scientific computing environments
- Knowledge of legal, policy, regulatory, or similar domains
- Knowledge of web scraping, XML/HTML parsing, and working with semi-structured documents
- Contributions to open-source projects or public datasets
Other Information
LI-JN2