The successful candidate will work closely with LOCKSS Program technical staff to analyze publisher Web sites and build Web crawler plugins to process the content for preservation. A LOCKSS plugin is specific to each publishing platform and determines what content will be collected and preserved. Work will be reviewed before being committed to production.
1. Analyze publisher Web sites, and their hierarchy, URL structure, layout, etc.
2. Implement crawling strategies as LOCKSS plugins and perform quality assurance on them.
3. Improve on existing LOCKSS plugins by upgrading them to our current best practices, refining them to account for incremental changes to publisher Web sites and addressing bug reports from end users.
Describe the technical or business knowledge required to complete the job’s primary responsibilities.
• Java Programming - One-year experience
• Knowledge of XML, HTML/XHTML, CSS
• Knowledge of URL structures
• Knowledge of regular expressions
• Familiarity with a UNIX-based operating system
• Proven inquisitiveness, curiosity, and a quick learner of new tools
• Strong analytical skills for effective problem solving
• Fierce attention to detail
• Understanding of quality control methods, procedures, and guidelines
• Excellent organizational and communication skills
• Ability to work as part of a small team
• Ability to work in a high pressure, large volume production environment and to meet production standards on time and as specified.
Four-year college degree or equivalent in Computer Science
30 Analyze Publisher Web Sites
10 Talk to publishers and platform vendors
40 Implement crawling strategies as LOCKSS plugins and perform quality assurance on them
5 Assist Senior Engineers with miscellaneous tasks.
10 Improve and upgrade existing LOCKSS plugins
5 Process content for preservation.
Last updated: Tuesday, February 28, 2017 23:45 UTC