Web Archive Engineer

1 other recent jobs
Created: August 13, 2013


Come join the digital library dream team at Stanford University Library as our new Web Archiving Engineer! We offer Silicon Valley competitive salaries, a beautiful campus with balmy weather and palm trees, and a team of fun and talented library programmers.

Web Archive Engineer - 60432

This position is double-posted at the 4P3 and 4P4 levels.
This is a four-year fixed-term position with the possibility of an extension.

Job Objective:

Stanford University Libraries (SUL) is seeking a talented software engineer to support the Web Archiving Service. This is a four year fixed-term position with the possibility of an extension.

The position is a key element in the implementation and ongoing support of SUL's Web Archiving Service. The Service will enable the archiving of web content into the Stanford Digital Repository (SDR) on behalf of Stanford librarians, faculty, and researchers and in support of the University's needs for research, teaching, library collection building, and regulatory compliance.

The Web Archiving Engineer will primarily develop and maintain software to facilitate web archiving workflows and use cases: harvesting, data management, quality assurance, discovery, indexing, access and analysis. This will entail deployment, local optimization and possible enhancement of community-developed open source web archiving tools and best practices.

Reporting to the Manager for Application Development and working closely with the Web Archiving Service Manager, the successful candidate will be responsible for developing, configuring and/or managing web archiving systems and related digital library components; pioneering tools and techniques for the collection, replay and preservation of the next generation of web technologies; troubleshooting and resolving technical issues related to Service operation; and streamlining the processing of archived web content through the entire lifecycle.

Primary Responsibilities:

Systems Analysis, Architecture Design, Implementation and Administration (50%)
Provide technical analysis and software engineering support for web archiving and related digital preservation activities at SUL. Install, configure and manage Heritrix, Wayback Machine and other components necessary to build an end-to-end service. Streamline the ingest of harvested and other target content and associated metadata into repository, discovery and access environments.

Operational Support (25%)
Collaborate with the Web Archiving Service Manager to troubleshoot and resolve technical issues affecting harvest, replay and web archiving workflows. Generate Wayback Machine and Lucene indexes to enable web archive replay, full-text searching and metadata analysis.

Harvest Engineering (15%)
Develop tools and techniques to enable archival capture and replay of rich media, streaming content, social media as well as traditional web page content. Administer web crawls to maximize data capture quality and efficient use of limited resources.

Community Engagement (10%)
Play an active role in the cultural heritage web archiving community. Stay abreast of evolving best practices and tools for web archiving and make appropriate recommendations for local service enhancement.


Minimum Qualifications
  • Demonstrated expertise with Ruby and Ruby on Rails application development.
  • Demonstrated expertise deploying, configuring and managing Apache HTTP Server and Apache Tomcat.
  • Demonstrated expertise with Unix/Linux and command-line utilities, such as awk, find, and grep.
  • Demonstrated expertise with JavaScript and regular expressions.
  • Demonstrated expertise with XML and XSLT.
  • Demonstrated experience with relational database design and management, including implementing database applications for MySQL, Oracle or PostgreSQL.
  • Self-bootstrapping learner. Adept at quickly learning new scripting and programming languages and making sense of unfamiliar architectures and application designs.
  • Demonstrated ability to write solid, simple, elegant code both independently and in a team-programming environment and within schedule limitations.
  • Demonstrated ability to work collaboratively with multiple levels of staff and colleagues at peer institutions and within the open source community on projects from specification to launch. Excellent verbal and written communication skills.
  • Demonstrated ability to apply best practices to technical projects, especially test-first development and automated testing. Must also make effective use of team collaboration tools, build management and version control systems.
  • Demonstrated experience providing ongoing support for technical services, including experience monitoring and managing a solution.
  • Four-year college degree or equivalent, with five to seven years of demonstrated experience.
  • At the 4P4 level, four-year college degree or equivalent, with more than seven years of demonstrated experience.
Preferred Qualifications
  • Demonstrated knowledge of web archiving tools, techniques, issues and trends.
  • Demonstrated expertise with Lucene/Solr.
  • Demonstrated expertise with distributed computing technologies, such as Hadoop, HBase and Pig.
  • Demonstrated experience with file characterization tools, such as JHOVE, FITS, DROID and Apache Tika.
  • Demonstrated experience with library-related metadata and metadata standards, particularly DC, MODS, MARC, METS and EAD.
  • Demonstrated success participating in community-based open source projects, especially those relevant to SUL's Digital Library architecture, such as Fedora, Blacklight, Solr or Hydra.
  • Demonstrated experience with library applications and technology, especially experience participating in relevant library open source efforts.
  • Demonstrated experience working in an academic and/or library environment.
  • Master’s degree in Computer Science, Information Science or related field.


Published: Tuesday, August 13, 2013 22:54 UTC

Last updated: Tuesday, February 28, 2017 23:44 UTC