Come play with a new search engine

Do you have a search bug that really annoys you? If so, please come help me test a new search engine!

I’ve ported our search services to a new text search engine called Xapian. While the indexes are bigger on disk, it is easier to install, much faster to index and probably also faster to search. And, over Lucene it has vastly fewer problems. And you can perform stop word searches!

Come play with it on my dev server! (Never mind the connection being slow and indexes being a few weeks old) Report issues to the usual place please!

9 thoughts on “Come play with a new search engine

  1. nospam

    Why is Xapian better than lucene? Doesn’t lucene support stop word searches too? Couldn’t you build your own lucene searcher and analyzer too?

  2. Mayhem

    Lucene is written in java. And deploying that is a PAIN IN THE ASS for us. We don’t use Java, except for this. To install a search server, you need to HAND COMPILE gcc-3.4.6 in order to hand compile PyLucene. With Xapian all we need to do is and apt-get command!

    But with all the tweaking we’ve done with Lucene, we haven’t gotten very far. With very little tweaking Xapian is giving much better search results and much better speed. As far as I am concerned Xapian is better ALL AROUND!

  3. voiceinsideyou

    mayhem – I am getting repeated 500 read timeouts for 80% of the searches I try on the dev server. Makes testing a little difficult! Are you able to take a look?

  4. nospam

    Well I suppose if you’re not setup for java then using lucene would be a pain. Why does Xapian provide better search results out of the box? I would imagine that the default analyzers might be similar to lucene’s, for example removing non-alpha characters, etc.

  5. Mayhem

    voice:

    I’ve upped the timeout from 2 seconds to 20 since I am building new indexes in the background on the same box. But if a search fails, repeat it — it usually works the second time.

    The indexes should be done in ~250 minutes

  6. jesus2099

    Hi,
    I tested the new search and, for the moment, it may need some tweaking maybe ?

    If I am looking for Artist Quốc Đại, and for exxample, I type “quoc dai”.

    Lucene returns Quốc Đại as first result (99%), Xapian returns Quốc Đại as 6th result (only 44%).

    Now for 鄧麗君.

    I type only part of the first name. Lucene finds her but Xapian doesn’t find her at all.

  7. jesus2099

    If I hit preview in this comment box, I can’t then Post the comment (anti-spam).

    I meant if I only type first name of 鄧麗君 (麗君), Xapian doesn’t find anything, Lucene finds her (99%).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s