Year: 2009

  • Installing PyLucene on OSX 10.5

    I was pleasantly surprised at my experience installing PyLucene this morning on my OSX 10.5 laptop. The installation instructions worked perfectly without a hiccup. This may not be impressive if you’ve never installed (or attempted to install) PyLucene before.

    I tried once a year or so back and was unsuccessful. The build process just never worked for me and I couldn’t find a binary build that fit my OS + Python version + Java version combination.

    Check out PyLucene:

    $ svn co http://svn.apache.org/repos/asf/lucene/pylucene/trunk pylucene
    

    Build JCC. I install Python packages in my home directory and if you do so too you can omit sudo before the last command, otherwise leave it in:

    $ cd pylucene/jcc
    $ python setup.py build
    $ sudo python setup.py install
    

    Now we need to edit PyLucene’s Makefile to be configured for OSX and Python 2.5. If you use a different setup than the one that ships with OSX 10.5, you’ll have to adjust these parameters to match your setup.

    Edit the Makefile:

    $ cd ..
    $ nano Makefile
    

    Uncomment the 5 lines Below the comment # Mac OS X (Python 2.5, Java 1.5). If you have installed a different version of Python such as 2.6, there should be a combination that works for you. Here’s what I uncommented:

    # Mac OS X  (Python 2.5, Java 1.5)
    PREFIX_PYTHON=/usr
    ANT=ant
    PYTHON=$(PREFIX_PYTHON)/bin/python
    JCC=$(PYTHON) -m jcc --shared
    NUM_FILES=2
    

    Save the file, exit your editor, and build PyLucene:

    $ make
    

    If it doesn’t build properly check the settings in your Makefile.

    After a successful build, install it (again you can omit sudo if you install Python packages locally and not system-wide):

    $ sudo make install
    

    Now verify that it’s been installed:

    $ python
    Python 2.5.1 (r251:54863, Nov 11 2008, 17:46:48)
    [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import lucene
    >>>
    

    If it imports without a problem you should have a working PyLucene library. Rejoice.

  • Sphinx Search with PostgreSQL

    While I don’t plan on moving away from Apache Solr for my searching needs any time soon, Jeremy Zawodny’s post on Sphinx at craigslist made me want to take a closer look. Sphinx works with MySQL, PostgreSQL, and XML input as data sources, but MySQL seems to be the best documented. I’m a PostgreSQL guy so I ran in to a few hiccups along the way. These instructions, based on instructions on the Sphinx wiki, got me up and running on Ubuntu Server 8.10.

    Install build toolchain:

    $ sudo aptitude install build-essential checkinstall
    

    Install Postgres:

    $ sudo aptitude install postgresql postgresql-client \\
    postgresql-client-common postgresql-contrib \\
    postgresql-server-dev-8.3
    

    Get Sphinx source:

    $ wget http://www.sphinxsearch.com/downloads/sphinx-0.9.8.1.tar.gz
    $ tar xzvf sphinx-0.9.8.1.tar.gz
    $ cd sphinx-0.9.8.1
    

    Configure and make:

    $ ./configure --without-mysql --with-pgsql \\
    --with-pgsql-includes=/usr/include/postgresql/ \\
    --with-pgsql-lib=/usr/lib/postgresql/8.3/lib/
    $ make
    

    Run checkinstall:

    $ mkdir /usr/local/var
    $ sudo checkinstall
    

    Sphinx is now installed in /usr/local. Check out /usr/local/etc/ for configuration info.

    Create something to index:

    $ createdb -U postgres test
    $ psql -U postgres test
    test=# create table test (id integer primary key not null, text text);
    test=# insert into test (text) values ('Hello, World!');
    test=# insert into test (text) values ('This is a test.');
    test=# insert into test (text) values ('I have another thing to test.');
    test=# -- A user with a password is required.
    test=# create user foo with password 'bar';
    test=# alter table test owner to foo;
    test=# \\q
    

    Configure sphinx (replace nano with your editor of choice):

    $ cd /usr/local/etc
    $ sudo cp sphinx-min.conf.dist sphinx.conf
    $ sudo nano sphinx.conf
    

    These values worked for me. I left configuration for indexer and searchd unchanged:

    source src1
    {
      type = pgsql
      sql_host = localhost
      sql_user = foo
      sql_pass = bar
      sql_db = test
      sql_port = 5432
      sql_query = select id, text from test
      sql_query_info = SELECT * from test WHERE id=$id
    }
    
    index test1
    {
      source = src1
      path = /var/data/test1
      docinfo = extern
      charset_type = utf-8
    }
    

    Reindex:

    $ sudo mkdir /var/data
    $ sudo indexer --all
    

    Run searchd:

    $ sudo searchd
    

    Play:

    $ search world
    
    Sphinx 0.9.8.1-release (r1533)
    Copyright (c) 2001-2008, Andrew Aksyonoff
    
    using config file '/usr/local/etc/sphinx.conf'...
    index 'test1': query 'world ': returned 1 matches of 1 total in 0.000 sec
    
    displaying matches:
    1. document=1, weight=1
    
    words:
    1. 'world': 1 documents, 1 hits
    

    Use Python:

    cd sphinx-0.9.8.1/api
    python
    >>> import sphinxapi, pprint
    >>> c = sphinxapi.SphinxClient()
    >>> q = c.Query('world')
    >>> pprint.pprint(q)
    {'attrs': [],
     'error': '',
     'fields': ['text'],
     'matches': [{'attrs': {}, 'id': 1, 'weight': 1}],
     'status': 0,
     'time': '0.000',
     'total': 1,
     'total_found': 1,
     'warning': '',
     'words': [{'docs': 1, 'hits': 1, 'word': 'world'}]}
    

    If you add new data and want to reindex, make sure you use the --rotate flag:

    sudo indexer --rotate --all
    

    This is an extremely quick and dirty installation designed to give me a sandbox
    to play with. For production use you would want to run as a non-privileged user
    and would probably want to have an /etc/init.d script for searchd or run it
    behind supervised. If you’re looking to experiment with Sphinx and MySQL,
    there should be plenty of documentation out there to get you started.