<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: lucene</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/lucene.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2020-06-26T17:06:08+00:00</updated><author><name>Simon Willison</name></author><entry><title>Reducing search indexing latency to one second</title><link href="https://simonwillison.net/2020/Jun/26/reducing-search-indexing-latency-one-second/#atom-tag" rel="alternate"/><published>2020-06-26T17:06:08+00:00</published><updated>2020-06-26T17:06:08+00:00</updated><id>https://simonwillison.net/2020/Jun/26/reducing-search-indexing-latency-one-second/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="https://blog.twitter.com/engineering/en_us/topics/infrastructure/2020/reducing-search-indexing-latency-to-one-second.html"&gt;Reducing search indexing latency to one second&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Really detailed dive into the nuts and bolts of Twitter’s latest iteration of search indexing technology, including a great explanation of skip lists.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/data-structures"&gt;data-structures&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lucene"&gt;lucene&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/scaling"&gt;scaling&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/twitter"&gt;twitter&lt;/a&gt;&lt;/p&gt;



</summary><category term="data-structures"/><category term="lucene"/><category term="scaling"/><category term="search"/><category term="twitter"/></entry><entry><title>Who are major competitors to Solr?</title><link href="https://simonwillison.net/2010/Sep/2/who-are-major-competitors/#atom-tag" rel="alternate"/><published>2010-09-02T18:01:00+00:00</published><updated>2010-09-02T18:01:00+00:00</updated><id>https://simonwillison.net/2010/Sep/2/who-are-major-competitors/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/Who-are-major-competitors-to-Solr/answer/Simon-Willison"&gt;Who are major competitors to Solr?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;ElasticSearch is a really interesting one - it's the same underlying search library (Lucene) and the same integration model (an HTTP interface) but takes quite a different approach. It hasn't been around for a long time but it looks very impressive: &lt;span&gt;&lt;a href="http://www.elasticsearch.com/"&gt;http://www.elasticsearch.com/&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Other than that, popular open source search engines include Sphinx and Xapian. I'm a big fan of talking to a search engine via HTTP, so I've been keeping an eye on the &lt;span&gt;&lt;a href="http://www.flax.co.uk/"&gt;http://www.flax.co.uk/&lt;/a&gt;&lt;/span&gt; project which does that for Xapian.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/apache"&gt;apache&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lucene"&gt;lucene&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search-engines"&gt;search-engines&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/solr"&gt;solr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sphinx-search"&gt;sphinx-search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xapian"&gt;xapian&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="apache"/><category term="lucene"/><category term="search"/><category term="search-engines"/><category term="solr"/><category term="sphinx-search"/><category term="xapian"/><category term="quora"/></entry><entry><title>How do Solr, Lucene, Sphinx and Searchify compare?</title><link href="https://simonwillison.net/2010/Aug/26/how-do-solr-lucene/#atom-tag" rel="alternate"/><published>2010-08-26T14:14:00+00:00</published><updated>2010-08-26T14:14:00+00:00</updated><id>https://simonwillison.net/2010/Aug/26/how-do-solr-lucene/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/How-do-Solr-Lucene-Sphinx-and-Searchify-compare/answer/Simon-Willison"&gt;How do Solr, Lucene, Sphinx and Searchify compare?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Lucene is a Java library for creating and searching through a full text index. If you want to make use of it, you'll need to write your own Java code that integrates with it.&lt;/p&gt;

&lt;p&gt;Solr is a web service that is built on top of the Lucene library. You can talk to it over HTTP from any programming language - so you can take advantage of the power of Lucene without having to write any Java code at all. Solr also adds a number of features that Lucene leaves out such as sharding and replication.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/databases"&gt;databases&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lucene"&gt;lucene&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search-engines"&gt;search-engines&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/solr"&gt;solr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sphinx-search"&gt;sphinx-search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/web-development"&gt;web-development&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="databases"/><category term="lucene"/><category term="search"/><category term="search-engines"/><category term="solr"/><category term="sphinx-search"/><category term="web-development"/><category term="quora"/></entry><entry><title>Which major companies are using Solr for search?</title><link href="https://simonwillison.net/2010/Aug/25/which-major-companies-are/#atom-tag" rel="alternate"/><published>2010-08-25T11:43:00+00:00</published><updated>2010-08-25T11:43:00+00:00</updated><id>https://simonwillison.net/2010/Aug/25/which-major-companies-are/#atom-tag</id><summary type="html">
    &lt;p&gt;&lt;em&gt;My answer to &lt;a href="https://www.quora.com/Which-major-companies-are-using-Solr-for-search/answer/Simon-Willison"&gt;Which major companies are using Solr for search?&lt;/a&gt; on Quora&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The Guardian newspaper uses Solr for its Open Platform Content API. &lt;span&gt;&lt;a href="http://www.guardian.co.uk/open-platform/blog/what-is-powering-the-content-api"&gt;http://www.guardian.co.uk/open-p...&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/lucene"&gt;lucene&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/solr"&gt;solr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/quora"&gt;quora&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="lucene"/><category term="open-source"/><category term="search"/><category term="solr"/><category term="quora"/></entry><entry><title>[UPDATE] Spatial Search in Apache Lucene and Solr</title><link href="https://simonwillison.net/2010/Jul/20/spatial/#atom-tag" rel="alternate"/><published>2010-07-20T18:28:00+00:00</published><updated>2010-07-20T18:28:00+00:00</updated><id>https://simonwillison.net/2010/Jul/20/spatial/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.lucidimagination.com/blog/2010/07/20/update-spatial-search-in-apache-lucene-and-solr/"&gt;[UPDATE] Spatial Search in Apache Lucene and Solr&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Spacial search is finally coming (back) to Solr—trunk now supports sorting and boosting by distance.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/geospatial"&gt;geospatial&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lucene"&gt;lucene&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/solr"&gt;solr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/recovered"&gt;recovered&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/spatialsearch"&gt;spatialsearch&lt;/a&gt;&lt;/p&gt;



</summary><category term="geospatial"/><category term="lucene"/><category term="search"/><category term="solr"/><category term="recovered"/><category term="spatialsearch"/></entry><entry><title>Elastic Search</title><link href="https://simonwillison.net/2010/Feb/11/elastic/#atom-tag" rel="alternate"/><published>2010-02-11T18:33:14+00:00</published><updated>2010-02-11T18:33:14+00:00</updated><id>https://simonwillison.net/2010/Feb/11/elastic/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.elasticsearch.com/"&gt;Elastic Search&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Solr has competition! Like Solr, Elastic Search provides a RESTful JSON HTTP interface to Lucene. The focus here is on distribution, auto-sharding and high availability. It’s even easier to get started with than Solr, partly due to the focus on providing a schema-less document store, but it’s currently missing out on a bunch of useful Solr features (a web interface and faceting are the two that stand out). The high availability features look particularly interesting. UPDATE: I was incorrect, basic faceted queries are already supported.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="http://www.elasticsearch.com/blog/2010/02/08/youknowforsearch.html"&gt;ElasticSearch blog&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/elasticsearch"&gt;elasticsearch&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/http"&gt;http&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/java"&gt;java&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/json"&gt;json&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lucene"&gt;lucene&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/rest"&gt;rest&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/scaling"&gt;scaling&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sharding"&gt;sharding&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/solr"&gt;solr&lt;/a&gt;&lt;/p&gt;



</summary><category term="elasticsearch"/><category term="http"/><category term="java"/><category term="json"/><category term="lucene"/><category term="rest"/><category term="scaling"/><category term="search"/><category term="sharding"/><category term="solr"/></entry><entry><title>Digg Search: Now With 99.987% Less Suck</title><link href="https://simonwillison.net/2009/Apr/10/digg/#atom-tag" rel="alternate"/><published>2009-04-10T22:17:57+00:00</published><updated>2009-04-10T22:17:57+00:00</updated><id>https://simonwillison.net/2009/Apr/10/digg/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://blog.digg.com/?p=653"&gt;Digg Search: Now With 99.987% Less Suck&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Really nice implementation of faceted search, still using Lucene and Solr under the hood.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/digg"&gt;digg&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/facets"&gt;facets&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/full-text-search"&gt;full-text-search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lucene"&gt;lucene&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/solr"&gt;solr&lt;/a&gt;&lt;/p&gt;



</summary><category term="digg"/><category term="facets"/><category term="full-text-search"/><category term="lucene"/><category term="search"/><category term="solr"/></entry><entry><title>Guardian + Lucene = Similar Articles + Categorisation</title><link href="https://simonwillison.net/2009/Mar/11/hublog/#atom-tag" rel="alternate"/><published>2009-03-11T12:53:39+00:00</published><updated>2009-03-11T12:53:39+00:00</updated><id>https://simonwillison.net/2009/Mar/11/hublog/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://hublog.hubmed.org/archives/001823.html"&gt;Guardian + Lucene = Similar Articles + Categorisation&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Alf Eaton loaded 13,000 Guardian articles tagged Science in to Solr and Lucene and is using Solr’s MoreLikeThisHandler to find related articles and automatically apply Guardian tags to Nature News articles.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/alf-eaton"&gt;alf-eaton&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/full-text-search"&gt;full-text-search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/guardian"&gt;guardian&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lucene"&gt;lucene&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/naturenews"&gt;naturenews&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/openplatform"&gt;openplatform&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/solr"&gt;solr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/tagging"&gt;tagging&lt;/a&gt;&lt;/p&gt;



</summary><category term="alf-eaton"/><category term="full-text-search"/><category term="guardian"/><category term="lucene"/><category term="naturenews"/><category term="openplatform"/><category term="search"/><category term="solr"/><category term="tagging"/></entry><entry><title>Whoosh</title><link href="https://simonwillison.net/2009/Feb/12/whoosh/#atom-tag" rel="alternate"/><published>2009-02-12T12:49:59+00:00</published><updated>2009-02-12T12:49:59+00:00</updated><id>https://simonwillison.net/2009/Feb/12/whoosh/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://whoosh.ca/"&gt;Whoosh&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
A brand new, pure-python full text indexing engine (think Lucene). Claims to offer performance in the same league as wrappers to C or Java libraries. If this works as well as it claims it will be an excellent tool for adding search to projects that wish to avoid a dependency on an external engine.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/full-text-search"&gt;full-text-search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lucene"&gt;lucene&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/open-source"&gt;open-source&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/whoosh"&gt;whoosh&lt;/a&gt;&lt;/p&gt;



</summary><category term="full-text-search"/><category term="lucene"/><category term="open-source"/><category term="python"/><category term="search"/><category term="whoosh"/></entry><entry><title>solango</title><link href="https://simonwillison.net/2009/Feb/4/solango/#atom-tag" rel="alternate"/><published>2009-02-04T12:22:46+00:00</published><updated>2009-02-04T12:22:46+00:00</updated><id>https://simonwillison.net/2009/Feb/4/solango/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://code.google.com/p/django-solr-search/source/checkout"&gt;solango&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Another attempt at a Django/Solr integration library, based on code written for “a top 20 newspaper site” (I’d love to know which one). This is well documented, uses a registration model clearly inspired by the Django admin which keeps search related metadata out of your regular models and includes management commands for re-indexing and generating Solr schema.xml files.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/django"&gt;django&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lucene"&gt;lucene&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/solr"&gt;solr&lt;/a&gt;&lt;/p&gt;



</summary><category term="django"/><category term="lucene"/><category term="python"/><category term="search"/><category term="solr"/></entry><entry><title>pysolr</title><link href="https://simonwillison.net/2008/Jan/9/pysolr/#atom-tag" rel="alternate"/><published>2008-01-09T20:50:23+00:00</published><updated>2008-01-09T20:50:23+00:00</updated><id>https://simonwillison.net/2008/Jan/9/pysolr/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://code.google.com/p/pysolr/"&gt;pysolr&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Python wrapper for Solr, the search web service wrapper for Lucene. One thing I’m not clear on: do you need to configure Solr with the fields you’ll be indexing in advance, or can Solr create new fields on the fly to match the data you send it?


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/apache"&gt;apache&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lucene"&gt;lucene&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/pysolr"&gt;pysolr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/solr"&gt;solr&lt;/a&gt;&lt;/p&gt;



</summary><category term="apache"/><category term="lucene"/><category term="pysolr"/><category term="python"/><category term="search"/><category term="solr"/></entry><entry><title>Apache Solr 1.1</title><link href="https://simonwillison.net/2007/Jan/13/solr/#atom-tag" rel="alternate"/><published>2007-01-13T01:16:21+00:00</published><updated>2007-01-13T01:16:21+00:00</updated><id>https://simonwillison.net/2007/Jan/13/solr/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://osx.freshmeat.net/projects/solr/?branch_id=67276&amp;amp;release_id=243649"&gt;Apache Solr 1.1&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Solr is the search Web Service built on top of Lucene. The latest release introduces JSON, Python and Ruby response formats in addition to XML.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/json"&gt;json&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lucene"&gt;lucene&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/ruby"&gt;ruby&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/search"&gt;search&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/solr"&gt;solr&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/webservice"&gt;webservice&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xml"&gt;xml&lt;/a&gt;&lt;/p&gt;



</summary><category term="json"/><category term="lucene"/><category term="python"/><category term="ruby"/><category term="search"/><category term="solr"/><category term="webservice"/><category term="xml"/></entry></feed>