Month: February 2009

  • Installing PyLucene on OSX 10.5

    I was pleasantly surprised at my experience installing PyLucene this morning on my OSX 10.5 laptop. The installation instructions worked perfectly without a hiccup. This may not be impressive if you’ve never installed (or attempted to install) PyLucene before.

    I tried once a year or so back and was unsuccessful. The build process just never worked for me and I couldn’t find a binary build that fit my OS + Python version + Java version combination.

    Check out PyLucene:

    $ svn co pylucene

    Build JCC. I install Python packages in my home directory and if you do so too you can omit sudo before the last command, otherwise leave it in:

    $ cd pylucene/jcc
    $ python build
    $ sudo python install

    Now we need to edit PyLucene’s Makefile to be configured for OSX and Python 2.5. If you use a different setup than the one that ships with OSX 10.5, you’ll have to adjust these parameters to match your setup.

    Edit the Makefile:

    $ cd ..
    $ nano Makefile

    Uncomment the 5 lines Below the comment # Mac OS X (Python 2.5, Java 1.5). If you have installed a different version of Python such as 2.6, there should be a combination that works for you. Here’s what I uncommented:

    # Mac OS X  (Python 2.5, Java 1.5)
    JCC=$(PYTHON) -m jcc --shared

    Save the file, exit your editor, and build PyLucene:

    $ make

    If it doesn’t build properly check the settings in your Makefile.

    After a successful build, install it (again you can omit sudo if you install Python packages locally and not system-wide):

    $ sudo make install

    Now verify that it’s been installed:

    $ python
    Python 2.5.1 (r251:54863, Nov 11 2008, 17:46:48)
    [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import lucene

    If it imports without a problem you should have a working PyLucene library. Rejoice.

  • Sphinx Search with PostgreSQL

    While I don’t plan on moving away from Apache Solr for my searching needs any time soon, Jeremy Zawodny’s post on Sphinx at craigslist made me want to take a closer look. Sphinx works with MySQL, PostgreSQL, and XML input as data sources, but MySQL seems to be the best documented. I’m a PostgreSQL guy so I ran in to a few hiccups along the way. These instructions, based on instructions on the Sphinx wiki, got me up and running on Ubuntu Server 8.10.

    Install build toolchain:

    $ sudo aptitude install build-essential checkinstall

    Install Postgres:

    $ sudo aptitude install postgresql postgresql-client \\
    postgresql-client-common postgresql-contrib \\

    Get Sphinx source:

    $ wget
    $ tar xzvf sphinx-
    $ cd sphinx-

    Configure and make:

    $ ./configure --without-mysql --with-pgsql \\
    --with-pgsql-includes=/usr/include/postgresql/ \\
    $ make

    Run checkinstall:

    $ mkdir /usr/local/var
    $ sudo checkinstall

    Sphinx is now installed in /usr/local. Check out /usr/local/etc/ for configuration info.

    Create something to index:

    $ createdb -U postgres test
    $ psql -U postgres test
    test=# create table test (id integer primary key not null, text text);
    test=# insert into test (text) values ('Hello, World!');
    test=# insert into test (text) values ('This is a test.');
    test=# insert into test (text) values ('I have another thing to test.');
    test=# -- A user with a password is required.
    test=# create user foo with password 'bar';
    test=# alter table test owner to foo;
    test=# \\q

    Configure sphinx (replace nano with your editor of choice):

    $ cd /usr/local/etc
    $ sudo cp sphinx-min.conf.dist sphinx.conf
    $ sudo nano sphinx.conf

    These values worked for me. I left configuration for indexer and searchd unchanged:

    source src1
      type = pgsql
      sql_host = localhost
      sql_user = foo
      sql_pass = bar
      sql_db = test
      sql_port = 5432
      sql_query = select id, text from test
      sql_query_info = SELECT * from test WHERE id=$id
    index test1
      source = src1
      path = /var/data/test1
      docinfo = extern
      charset_type = utf-8


    $ sudo mkdir /var/data
    $ sudo indexer --all

    Run searchd:

    $ sudo searchd


    $ search world
    Sphinx (r1533)
    Copyright (c) 2001-2008, Andrew Aksyonoff
    using config file '/usr/local/etc/sphinx.conf'...
    index 'test1': query 'world ': returned 1 matches of 1 total in 0.000 sec
    displaying matches:
    1. document=1, weight=1
    1. 'world': 1 documents, 1 hits

    Use Python:

    cd sphinx-
    >>> import sphinxapi, pprint
    >>> c = sphinxapi.SphinxClient()
    >>> q = c.Query('world')
    >>> pprint.pprint(q)
    {'attrs': [],
     'error': '',
     'fields': ['text'],
     'matches': [{'attrs': {}, 'id': 1, 'weight': 1}],
     'status': 0,
     'time': '0.000',
     'total': 1,
     'total_found': 1,
     'warning': '',
     'words': [{'docs': 1, 'hits': 1, 'word': 'world'}]}

    If you add new data and want to reindex, make sure you use the --rotate flag:

    sudo indexer --rotate --all

    This is an extremely quick and dirty installation designed to give me a sandbox
    to play with. For production use you would want to run as a non-privileged user
    and would probably want to have an /etc/init.d script for searchd or run it
    behind supervised. If you’re looking to experiment with Sphinx and MySQL,
    there should be plenty of documentation out there to get you started.