Searching for charities…

The last week or so our team has been struggling with search performance against our database of 1.4 million charities/nonprofits.  We’re making progress, but the causes for the performance issue have been very difficult to pin down.

Givvy uses the Lucene search engine, which is in its native rendition a part of the Apache project.  By all accounts it’s very robust, highly scalable and well-supported by the open source community.  CNET Reviews uses Lucene and it’s both fast and very usable.  Given that we have been building Givvy using the Symfony PHP framework, we have been using sfLucene (which is really a wrapper on the Zend Framework implementation of Lucene).  We’d love to use Google Search Appliance or something similar, but we don’t have the budget for that now…  Besides, open source is cool, ya?

Seth, our guru of all new knowledge, has been wrestling with this performance issue (also related - memory leak) for the past 2 weeks.  Recently he installed the Apache version which uses Java (Seth is new to Java).  Then he set up an index of 100,000 Scrabble words in both versions and tested the performance.  Seth is a competitive Scrabble player and runs the primary Scrabble rankings site, cross-tables.com.  Having been a Java bigot for a lot of my enterprise tech career (Sybase, etc.) I was expecting Lucene/Java to outperform sfLucene - which it did.  When I asked Seth “how much faster?” he replied “a lot.”  Typical Seth response… so I asked him to be a bit more precise.

While I knew it was “a lot” faster, I did not expect was that in one test Lucene was up to 30 times faster than sfLucene!  Now Seth is frantically trying to re-write our search system to use the Apache version of Lucene so we can try it against our 1.4M charity records.  Stay tuned!

3 Responses to “Searching for charities…”

  1. Very nice!!

  2. Have you looked at Attivio? One way to think of them is as Lucene + management of fielded data.

    Best of luck,

    CAM

  3. Curt - thanks for the pointer. I will check them out. We actually got what we needed from Solr which is Lucene in an easy-to-use wrapper. It’s very fast!

    Best,
    John

Leave a Reply