While I don’t plan on moving away from Apache Solr for my searching needs any time soon, Jeremy Zawodny’s post on Sphinx at craigslist made me want to take a closer look. Sphinx works with MySQL, PostgreSQL, and XML input as data sources, but MySQL seems to be the best documented. I’m a PostgreSQL guy so I ran in to a few hiccups along the way. These instructions, based on instructions on the Sphinx wiki, got me up and running on Ubuntu Server 8.10.
Install build toolchain:
$ sudo aptitude install build-essential checkinstall
Install Postgres:
$ sudo aptitude install postgresql postgresql-client \\ postgresql-client-common postgresql-contrib \\ postgresql-server-dev-8.3
Get Sphinx source:
$ wget http://www.sphinxsearch.com/downloads/sphinx-0.9.8.1.tar.gz $ tar xzvf sphinx-0.9.8.1.tar.gz $ cd sphinx-0.9.8.1
Configure and make:
$ ./configure --without-mysql --with-pgsql \\ --with-pgsql-includes=/usr/include/postgresql/ \\ --with-pgsql-lib=/usr/lib/postgresql/8.3/lib/ $ make
Run checkinstall:
$ mkdir /usr/local/var $ sudo checkinstall
Sphinx is now installed in /usr/local. Check out /usr/local/etc/ for configuration info.
Create something to index:
$ createdb -U postgres test $ psql -U postgres test test=# create table test (id integer primary key not null, text text); test=# insert into test (text) values ('Hello, World!'); test=# insert into test (text) values ('This is a test.'); test=# insert into test (text) values ('I have another thing to test.'); test=# -- A user with a password is required. test=# create user foo with password 'bar'; test=# alter table test owner to foo; test=# \\q
Configure sphinx (replace nano with your editor of choice):
$ cd /usr/local/etc $ sudo cp sphinx-min.conf.dist sphinx.conf $ sudo nano sphinx.conf
These values worked for me. I left configuration for indexer and searchd unchanged:
source src1 { type = pgsql sql_host = localhost sql_user = foo sql_pass = bar sql_db = test sql_port = 5432 sql_query = select id, text from test sql_query_info = SELECT * from test WHERE id=$id } index test1 { source = src1 path = /var/data/test1 docinfo = extern charset_type = utf-8 }
Reindex:
$ sudo mkdir /var/data $ sudo indexer --all
Run searchd:
$ sudo searchd
Play:
$ search world Sphinx 0.9.8.1-release (r1533) Copyright (c) 2001-2008, Andrew Aksyonoff using config file '/usr/local/etc/sphinx.conf'... index 'test1': query 'world ': returned 1 matches of 1 total in 0.000 sec displaying matches: 1. document=1, weight=1 words: 1. 'world': 1 documents, 1 hits
Use Python:
cd sphinx-0.9.8.1/api python >>> import sphinxapi, pprint >>> c = sphinxapi.SphinxClient() >>> q = c.Query('world') >>> pprint.pprint(q) {'attrs': [], 'error': '', 'fields': ['text'], 'matches': [{'attrs': {}, 'id': 1, 'weight': 1}], 'status': 0, 'time': '0.000', 'total': 1, 'total_found': 1, 'warning': '', 'words': [{'docs': 1, 'hits': 1, 'word': 'world'}]}
If you add new data and want to reindex, make sure you use the --rotate flag:
sudo indexer --rotate --all
This is an extremely quick and dirty installation designed to give me a sandbox
to play with. For production use you would want to run as a non-privileged user
and would probably want to have an /etc/init.d script for searchd or run it
behind supervised. If you’re looking to experiment with Sphinx and MySQL,
there should be plenty of documentation out there to get you started.