Hathi Trust Print Holdings Database Developer
The Library Information Technology (LIT) division provides comprehensive technology support and guidance for the University of Michigan Library system, including significant work in support of the operations of the HathiTrust Digital Library (a collaboration of major research institutions and libraries working to ensure that the cultural record is preserved and accessible long into the future).
The Library Systems Office, a part of LIT, develops and manages systems and resources that support traditional library services (metadata creation and management, acquisition and circulation of materials, discovery interfaces, etc.), as well as a variety of other systems and functions that depend on bibliographic and descriptive metadata.
NOTE: This is a term-limited appointment funded though December 31, 2017, with the possibility for renewal.
The Library Systems Office is currently seeking a talented, resourceful software developer to take the lead in managing and enhancing a database of print holdings data collected from all of the 80+ HathiTrust partner institutions. This large (some tables contain hundreds of millions of rows), rapidly evolving collection of data requires regular updates that rely on a mix of relational database operations, external scripts, and cloud-based calculations using Amazon Elastic MapReduce. The database functions to map individual institutions' print holdings data to items in the HathiTrust digital repository, and serves as the basis for calculating costs of partnership, special access privileges and a number of other useful queries. The print holdings database is a fundamental component of the dynamic HathiTrust Digital Library platform, and it requires a skilled and agile developer to manage and guide its implementation as HathiTrust continues to grow.
Project tasks and goals include:
- Working closely with other HathiTrust developers and members of the Library Systems, Core Services, and the Digital Library Production Service staff to develop and improve the print holdings database, including parsing, normalizing and ingesting holdings data submitted by HathiTrust partners
- Building and running custom reports, queries, and data processing scripts and routines
- Accommodating new data types and functionalities
- Developing a regular, automated update strategy
- Documenting apparent data anomalies and processes that could be used to minimize them and, potentially, building a web-based interface to the database.
- Bachelor's degree in Computer Science or a related field and 3 to 5 years of work experience, or an equivalent combination of education and experience
- Demonstrated proficiency with relational database technologies such as MySQL, including experience with design and implementation of very large databases.
- Demonstrated programming skills in at least one modern programming language
- Demonstrated experience processing and mining very large files of textual data.
- Facility with Linux-based operating systems
- Strong analytical and troubleshooting skills
- Excellent written and verbal communication
- Ability to creatively improve workflows and processes
- Ability to function independently in a dynamic multicultural/collaborative environment.
- Experience with Ruby on Rails and Perl.
- Background and experience with cloud computing, MapReduce (Hadoop, Pig), and/or NoSQL technologies.
- Knowledge of library metadata standards (MARC21/RDA, Dublin Core, etc.)
- Basic web application development experience
- Experience using version control systems in software development
- Familiarity with batch file processing techniques on the command line
Job openings are posted for a minimum of seven calendar days. This job may be removed from posting boards and filled anytime after the minimum posting period has ended.How to Apply
A cover letter is required for consideration for this position and should be attached as the first page of your resume. The cover letter should address your specific interest in the position and outline skills and experience that directly relate to this position.
Published: Tuesday, June 11, 2013 13:41 UTC
Last updated: Tuesday, February 28, 2017 23:45 UTC