Data Engineer, Baker Library, Harvard Business School (Boston)
HBS’s Baker Library is seeking a temporary Data Engineer to help launch a faculty citation data project aimed at better understanding how its collections support and influence scholarly research. This initiative involves identifying faculty publications, extracting their cited references, and analyzing the relationships within this data to generate meaningful insights into patterns of use and library collection impact. By analyzing citations, the project seeks to surface evidence of how Baker’s resources contribute to the research ecosystem at HBS.
Reporting to Baker Library’s User Needs and Assessment Librarian, this temporary Data Engineer role will focus on the final phase of the project, where a corpus of raw citation data has already been collected and aggregated from multiple sources. At this stage, the data requires careful cleaning, normalization, and transformation to ensure it is accurate, consistent, and suitable for analysis. The individual in this role will work with this messy dataset to standardize fields, resolve inconsistencies, and prepare the data for downstream analytical work. This phase is critical to ensuring the reliability and interpretability of the project’s findings and will directly shape the quality of insights generated about Baker’s impact.
Responsibilities
· Clean and normalize raw citation data by resolving inconsistencies in author names, publication titles, journal names, and other variables
· Co-develop and apply standardized schemas for field names and data structures to ensure consistency across the dataset
· Design and implement reproducible data cleaning workflows using scripts that can be reused
· Co-create or locate unique identifiers (e.g., for authors, works, journals) to enable accurate linking and deduplication across records
· Perform record linkage and deduplication using techniques such as fuzzy matching and string comparison
· Assess and improve data quality by identifying missing, inconsistent, or anomalous values and determining appropriate remediation strategies
· Conduct exploratory analysis to evaluate the completeness and reliability of the dataset, including identifying patterns of data gaps
· Collaborate with project stakeholders to align data cleaning decisions with project goals
· Explore connection points for citation data with other HBS administrative datasets
· Document data transformations, data dictionaries, and workflows to support transparency, reproducibility, and future project phases
This temporary, full-time role is 40 hours/ week, 100% remote.
Start date: Available starting mid-May 2026.
End date: 3 months after first date of work.
Qualifications
· Experience working with messy, real-world datasets
· Advanced proficiency in R (preferred), using libraries such as dplyr, tidyr, and tidyverse, or Python, using libraries such as pandas
--- Familiarity with regular expressions (regex), string comparison, and fuzzy matching
· Proficient understanding of standardization principles and controlled vocabularies
· Ability to balance precision and pragmatism when making decisions in the absence of perfect information
· Comfort documenting processes and decisions for both technical and non-technical audiences
· Ability to work independently while also seeking input when project ambiguity or edge cases arise
· Ability to envision how data cleaning and manipulation serve larger project goals
· Basic understanding of academic publishing and citation formats
· Proficiency in Microsoft Office tools (Outlook email, Teams sites, folder management, file retrieval)
Full/Part Time
Temporary
Education
NA
Salary
$45.00 / hourClosing Date
05/05/2026
How to Apply
To apply, send your resume and cover letter to jzimmett@hbs.edu.
A cover letter is required to apply for this role.
We may conduct candidate interviews virtually (phone and/or via Zoom) and/or in-person for this role.
This is a temporary, full-time, remote position. Employees in fully remote positions must work all scheduled hours in a Harvard registered state in compliance with the University’s Policy on Employment Outside of Massachusetts. Specific hours and work days will be determined by business needs and are subject to change with appropriate advanced notice.
Posted
2026-05-15 | Quick link to this job
