Busy making things: github, links, photos, @mc.

Installing PyLucene on OSX 10.5

Posted: February 11th, 2009 | Author: | Filed under: Apple, Java, Python | Comments Off

I was pleasantly surprised at my experience installing PyLucene this morning on my OSX 10.5 laptop. The installation instructions worked perfectly without a hiccup. This may not be impressive if you’ve never installed (or attempted to install) PyLucene before.

I tried once a year or so back and was unsuccessful. The build process just never worked for me and I couldn’t find a binary build that fit my OS + Python version + Java version combination.

Check out PyLucene:

$ svn co http://svn.apache.org/repos/asf/lucene/pylucene/trunk pylucene

Build JCC. I install Python packages in my home directory and if you do so too you can omit sudo before the last command, otherwise leave it in:

$ cd pylucene/jcc
$ python setup.py build
$ sudo python setup.py install

Now we need to edit PyLucene’s Makefile to be configured for OSX and Python 2.5. If you use a different setup than the one that ships with OSX 10.5, you’ll have to adjust these parameters to match your setup.

Edit the Makefile:

$ cd ..
$ nano Makefile

Uncomment the 5 lines Below the comment # Mac OS X (Python 2.5, Java 1.5). If you have installed a different version of Python such as 2.6, there should be a combination that works for you. Here’s what I uncommented:

# Mac OS X  (Python 2.5, Java 1.5)
PREFIX_PYTHON=/usr
ANT=ant
PYTHON=$(PREFIX_PYTHON)/bin/python
JCC=$(PYTHON) -m jcc --shared
NUM_FILES=2

Save the file, exit your editor, and build PyLucene:

$ make

If it doesn’t build properly check the settings in your Makefile.

After a successful build, install it (again you can omit sudo if you install Python packages locally and not system-wide):

$ sudo make install

Now verify that it’s been installed:

$ python
Python 2.5.1 (r251:54863, Nov 11 2008, 17:46:48)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import lucene
>>>

If it imports without a problem you should have a working PyLucene library. Rejoice.


Sphinx Search with PostgreSQL

Posted: February 6th, 2009 | Author: | Filed under: Open Source, Python | Comments Off

While I don’t plan on moving away from Apache Solr for my searching needs any time soon, Jeremy Zawodny’s post on Sphinx at craigslist made me want to take a closer look. Sphinx works with MySQL, PostgreSQL, and XML input as data sources, but MySQL seems to be the best documented. I’m a PostgreSQL guy so I ran in to a few hiccups along the way. These instructions, based on instructions on the Sphinx wiki, got me up and running on Ubuntu Server 8.10.

Install build toolchain:

$ sudo aptitude install build-essential checkinstall

Install Postgres:

$ sudo aptitude install postgresql postgresql-client \\
postgresql-client-common postgresql-contrib \\
postgresql-server-dev-8.3

Get Sphinx source:

$ wget http://www.sphinxsearch.com/downloads/sphinx-0.9.8.1.tar.gz
$ tar xzvf sphinx-0.9.8.1.tar.gz
$ cd sphinx-0.9.8.1

Configure and make:

$ ./configure --without-mysql --with-pgsql \\
--with-pgsql-includes=/usr/include/postgresql/ \\
--with-pgsql-lib=/usr/lib/postgresql/8.3/lib/
$ make

Run checkinstall:

$ mkdir /usr/local/var
$ sudo checkinstall

Sphinx is now installed in /usr/local. Check out /usr/local/etc/ for configuration info.

Create something to index:

$ createdb -U postgres test
$ psql -U postgres test
test=# create table test (id integer primary key not null, text text);
test=# insert into test (text) values ('Hello, World!');
test=# insert into test (text) values ('This is a test.');
test=# insert into test (text) values ('I have another thing to test.');
test=# -- A user with a password is required.
test=# create user foo with password 'bar';
test=# alter table test owner to foo;
test=# \\q

Configure sphinx (replace nano with your editor of choice):

$ cd /usr/local/etc
$ sudo cp sphinx-min.conf.dist sphinx.conf
$ sudo nano sphinx.conf

These values worked for me. I left configuration for indexer and searchd unchanged:

source src1
{
  type = pgsql
  sql_host = localhost
  sql_user = foo
  sql_pass = bar
  sql_db = test
  sql_port = 5432
  sql_query = select id, text from test
  sql_query_info = SELECT * from test WHERE id=$id
}

index test1
{
  source = src1
  path = /var/data/test1
  docinfo = extern
  charset_type = utf-8
}

Reindex:

$ sudo mkdir /var/data
$ sudo indexer --all

Run searchd:

$ sudo searchd

Play:

$ search world

Sphinx 0.9.8.1-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff

using config file '/usr/local/etc/sphinx.conf'...
index 'test1': query 'world ': returned 1 matches of 1 total in 0.000 sec

displaying matches:
1. document=1, weight=1

words:
1. 'world': 1 documents, 1 hits

Use Python:

cd sphinx-0.9.8.1/api
python
>>> import sphinxapi, pprint
>>> c = sphinxapi.SphinxClient()
>>> q = c.Query('world')
>>> pprint.pprint(q)
{'attrs': [],
 'error': '',
 'fields': ['text'],
 'matches': [{'attrs': {}, 'id': 1, 'weight': 1}],
 'status': 0,
 'time': '0.000',
 'total': 1,
 'total_found': 1,
 'warning': '',
 'words': [{'docs': 1, 'hits': 1, 'word': 'world'}]}

If you add new data and want to reindex, make sure you use the --rotate flag:

sudo indexer --rotate --all

This is an extremely quick and dirty installation designed to give me a sandbox
to play with. For production use you would want to run as a non-privileged user
and would probably want to have an /etc/init.d script for searchd or run it
behind supervised. If you’re looking to experiment with Sphinx and MySQL,
there should be plenty of documentation out there to get you started.