Day: February 6, 2009

  • Sphinx Search with PostgreSQL

    While I don’t plan on moving away from Apache Solr for my searching needs any time soon, Jeremy Zawodny’s post on Sphinx at craigslist made me want to take a closer look. Sphinx works with MySQL, PostgreSQL, and XML input as data sources, but MySQL seems to be the best documented. I’m a PostgreSQL guy so I ran in to a few hiccups along the way. These instructions, based on instructions on the Sphinx wiki, got me up and running on Ubuntu Server 8.10.

    Install build toolchain:

    $ sudo aptitude install build-essential checkinstall
    

    Install Postgres:

    $ sudo aptitude install postgresql postgresql-client \\
    postgresql-client-common postgresql-contrib \\
    postgresql-server-dev-8.3
    

    Get Sphinx source:

    $ wget http://www.sphinxsearch.com/downloads/sphinx-0.9.8.1.tar.gz
    $ tar xzvf sphinx-0.9.8.1.tar.gz
    $ cd sphinx-0.9.8.1
    

    Configure and make:

    $ ./configure --without-mysql --with-pgsql \\
    --with-pgsql-includes=/usr/include/postgresql/ \\
    --with-pgsql-lib=/usr/lib/postgresql/8.3/lib/
    $ make
    

    Run checkinstall:

    $ mkdir /usr/local/var
    $ sudo checkinstall
    

    Sphinx is now installed in /usr/local. Check out /usr/local/etc/ for configuration info.

    Create something to index:

    $ createdb -U postgres test
    $ psql -U postgres test
    test=# create table test (id integer primary key not null, text text);
    test=# insert into test (text) values ('Hello, World!');
    test=# insert into test (text) values ('This is a test.');
    test=# insert into test (text) values ('I have another thing to test.');
    test=# -- A user with a password is required.
    test=# create user foo with password 'bar';
    test=# alter table test owner to foo;
    test=# \\q
    

    Configure sphinx (replace nano with your editor of choice):

    $ cd /usr/local/etc
    $ sudo cp sphinx-min.conf.dist sphinx.conf
    $ sudo nano sphinx.conf
    

    These values worked for me. I left configuration for indexer and searchd unchanged:

    source src1
    {
      type = pgsql
      sql_host = localhost
      sql_user = foo
      sql_pass = bar
      sql_db = test
      sql_port = 5432
      sql_query = select id, text from test
      sql_query_info = SELECT * from test WHERE id=$id
    }
    
    index test1
    {
      source = src1
      path = /var/data/test1
      docinfo = extern
      charset_type = utf-8
    }
    

    Reindex:

    $ sudo mkdir /var/data
    $ sudo indexer --all
    

    Run searchd:

    $ sudo searchd
    

    Play:

    $ search world
    
    Sphinx 0.9.8.1-release (r1533)
    Copyright (c) 2001-2008, Andrew Aksyonoff
    
    using config file '/usr/local/etc/sphinx.conf'...
    index 'test1': query 'world ': returned 1 matches of 1 total in 0.000 sec
    
    displaying matches:
    1. document=1, weight=1
    
    words:
    1. 'world': 1 documents, 1 hits
    

    Use Python:

    cd sphinx-0.9.8.1/api
    python
    >>> import sphinxapi, pprint
    >>> c = sphinxapi.SphinxClient()
    >>> q = c.Query('world')
    >>> pprint.pprint(q)
    {'attrs': [],
     'error': '',
     'fields': ['text'],
     'matches': [{'attrs': {}, 'id': 1, 'weight': 1}],
     'status': 0,
     'time': '0.000',
     'total': 1,
     'total_found': 1,
     'warning': '',
     'words': [{'docs': 1, 'hits': 1, 'word': 'world'}]}
    

    If you add new data and want to reindex, make sure you use the --rotate flag:

    sudo indexer --rotate --all
    

    This is an extremely quick and dirty installation designed to give me a sandbox
    to play with. For production use you would want to run as a non-privileged user
    and would probably want to have an /etc/init.d script for searchd or run it
    behind supervised. If you’re looking to experiment with Sphinx and MySQL,
    there should be plenty of documentation out there to get you started.