Software Engineer (Search)
Created:
December 28, 2012
Description
Job Summary
We are looking for an exceptional search engineer with battle-proven experience in developing and deploying Lucene/Solr based search applications. You will be given the rare opportunity to lead the redesign of the search infrastructure that powers Wikipedia, and all projects run by the Wikimedia Foundation.
The Challenge
Currently, the search infrastructure powers about 1400 requests per second and consists of a replicated server farm of 50 nodes in two datacenters that provides search for over 800 wikis, including all of Wikipedia. Our current codebase is in dire straits, it was developed around Lucene 2.3 and is plagued by bitrot.
Some of the challenges that you will have to solve include:
- Develop a near-realtime indexer
- Develop a sharding strategy for the indexes
- Develop a suite of precision/recall and performance benchmarks
- Develop a method by which the Wikipedia community can help improve the quality of the search index
- Develop solutions for long-standing feature / bug requests from the Wikimedia communities, including: Index transcluded wikitext; better tokenization of wikitext; multi-language search support; other relevant search bugs
Your Background
- You have multiple large-scale (>1M documents) Solr deployments under your belt and have experience with indexing non-latin based alphabets
- Unicode does not intimidate you. We are not just looking for experience
- We also want somebody who sees improving search as an important step to better support both our editors and readers on all of the Wikimedia projects
- You obviously are very comfortable with Java, Maven and have an intimate knowledge of the Lucene and Solr libraries
- Preferably, you have a formal education in computer science with a specialization in information retrieval and you speak one or more languages besides English
- Experience with MediaWiki and the Wikipedia community in general is also a big plus
- You are passionate about the free culture movement and know how to get your point across in a consensus-based environment
About the Wikimedia Foundation
The Wikimedia Foundation is the non-profit organization that operates Wikipedia, the free encyclopedia. According to comScore Media Metrix, Wikipedia and the other projects operated by the Wikimedia Foundation receive more than 482 million unique visitors per month, making them the 5th most popular web property worldwide. Available in more than 270 languages, Wikipedia contains more than 21 million articles contributed by a global volunteer community of more than 100,000 people. Based in San Francisco, California, the Wikimedia Foundation is an audited, 501(c)(3) charity that is funded primarily through donations and grants. The Wikimedia Foundation was created in 2003 to manage the operation of Wikipedia and its sister projects. It currently employs 78 staff members. Wikimedia is supported by local chapter organizations in 31 countries or regions.
http://wikimediafoundation.org
http://blog.wikimedia.org
Metadata
Published: Friday, December 28, 2012 00:03 UTC
Last updated: Tuesday, February 28, 2017 23:46 UTC