Category: Java

  • Parsing CSV data in Scala with opencsv

    One of the great things about Scala (or any JVM language for that matter) is that you can take advantage of lots of libraries in the Java ecosystem. Today I wanted to parse a CSV file with Scala, and of course the first thing I did was search for scala csv. That yielded some interesting results, including a couple of roll-your-own regex-based implementations. I prefer to lean on established libraries instead of copying and pasting code from teh internet, so my next step was to search for java csv.

    The third hit down was opencsv and looked solid, had been updated recently, and was Apache-licensed. All good signs in my book. It’s also in the main maven repository, so adding it to my sbt 0.10.x build configuration was easy:

    
    libraryDependencies += "net.sf.opencsv" % "opencsv" % "2.1"
    

    The syntax for sbt 0.7.x is similar, but you should really upgrade:

    
    val opencsv = "net.sf.opencsv" % "opencsv" % "2.1"
    

    Once that configuration change is in place, running sbt update will let you use opencsv in either your project or the shell via sbt console.

    There are a couple of simple usage examples on the opencsv site along with a link to javadocs. The javadocs are currently for the development version (2.4) and include an improved iterator interface that would be useful for larger files.

    Let’s parse some CSV data in Scala. We’ll use a CSV version of violations of 14 CFR 91.11, 121.580 & 135.120, affectionately known as the unruly passenger dataset (as seen in the Django book):

    
    Year,Total
    1995,146
    1996,184
    1997,235
    1998,200
    1999,226
    2000,251
    2001,299
    2002,273
    2003,281
    2004,304
    2005,203
    2006,136
    2007,150
    2008,123
    2009,135
    2010,121
    

    You can download the raw data as unruly_passengers.txt.

    Here’s a full example of parsing the unruly passengers data:

    
    import au.com.bytecode.opencsv.CSVReader
    import java.io.FileReader
    import scala.collection.JavaConversions._
    
    val reader = new CSVReader(new FileReader("unruly_passengers.txt"))
    for (row <- reader.readAll) {
        println("In " + row(0) + " there were " + row(1) + " unruly passengers.")
    }
    

    This will print out the following:

    
    In Year there were Total unruly passengers.
    In 1995 there were 146 unruly passengers.
    In 1996 there were 184 unruly passengers.
    In 1997 there were 235 unruly passengers.
    In 1998 there were 200 unruly passengers.
    In 1999 there were 226 unruly passengers.
    In 2000 there were 251 unruly passengers.
    In 2001 there were 299 unruly passengers.
    In 2002 there were 273 unruly passengers.
    In 2003 there were 281 unruly passengers.
    In 2004 there were 304 unruly passengers.
    In 2005 there were 203 unruly passengers.
    In 2006 there were 136 unruly passengers.
    In 2007 there were 150 unruly passengers.
    In 2008 there were 123 unruly passengers.
    In 2009 there were 135 unruly passengers.
    In 2010 there were 121 unruly passengers.
    

    There are a couple of ways to make sure that the header line isn't included. If you specify the seperator and quote character, you can also tell it to skip any number of lines (one in this case):

    
    val reader = new CSVReader(new FileReader("unruly_passengers.txt"), ",", "\"", 1)
    

    Alternatively you could create a variable that starts true and is set to false after skipping the first line.

    Also worth mentioning is the JavaConversions import in the example. This enables explicit conversions between Java datatypes and Scala datatypes and makes working with Java libraries a lot easier. WIthout this import we couldn't use Scala's for loop syntactic sugar. In this case it's implicitly converting a Java.util.List to a scala.collection.mutable.Buffer.

    Another thing to be aware of is any cleaning of the raw field output that might need to be done. For example, some CSV files often have leading or training whitespace. A quick and easy way to take care of this is to trim leading and trailing whitespace: row(0).trim.

    This isn't the first time I've been pleasantly surprised working with a Java library in Scala, and I'm sure it won't be the last. Many thanks to the developers and maintainers of opencsv and to the creators of all of the open source libraries, frameworks, and tools in the Java ecosystem.

  • Getting to know Scala

    Over the past couple of weeks I’ve been spending some quality time with Scala. I haven’t really been outside of my Python shell (pun only slightly intended) since getting to know node.js several months back. I’m kicking myself for not picking it up sooner, it has a ton of useful properties:

    • The power and speed of the JVM and access to the Java ecosystem without the verbosity
    • An interesting mix of Object-Oriented and Functional programming (which sounds weird but works)
    • Static typing without type pain through inferencing in common scenarios
    • A REPL for when you just want to check how something works
    • An implementation of the Actor model for message passing and Erlang-style concurrency.

    Getting started

    The first thing I did was try to get a feel for Scala’s syntax. I started by skimming documentation and tutorials at scala-lang.org. I quickly learned that Programming Scala was available on the web so I started skimming that on a plane ride. It’s an excellent book and I need to snag a copy of my bookshelf.

    After getting to know the relatively concise and definitely expressive syntax of the language, I wanted to do something interesting with it. I had heard of a lot of folks using Netty for highly concurrent network services, so I thought I would try to do something with that. I started off tinkering with (and submitting a dependency patch to) naggati2, a toolkit for building protocols using Netty.

    After an hour or so I decided to shelve Naggati and get a better handle on the language and Netty itself. I browsed through several Scala projects using Netty and ended up doing a mechanistic (and probably not very idiomatic) port of a Java echo server. I put this up on github as scala-echo-server.

    Automation is key

    Because my little app has an external dependency, I really wanted to automate downloading that dependency and adding it to my libraries. At quick glance, it looked like it was possible to use Maven with Scala, and there was even a Scala plugin and archetype for it. I found the right archetype by typing mvn archetype:generate | less, found the number for scala-archetype-simple, and re-ran mvn archetype:generate, entering the correct code and answering a couple of questions. Once that was done, I could put code in src/main/scala/com/postneo and run mvn compile to compile my code.

    It was about this time that I realized that most of the Scala projects I saw were using simple-build-tool instead of Maven to handle dependencies and build automation. I quickly installed it and easily configured my echo server to use it. From there my project was a quick sbt clean update compile run from being completely automated. While I’m sure that Maven is good this feels like a great way to configure Scala projects.

    Something a little more complex

    After wrapping my head around the basics (though I did find myself back at the Scala syntax primer quite often), I decided to tackle something real but still relatively small in scope. I had implemented several archaic protocols while getting to know node.js, and I thought I’d pick one to learn Scala and Netty with. I settled on the Finger protocol as it existed in 1977 in RFC 742.

    The result of my work is an open source project called phalanges. I decided to use it as an opportunity to make use of several libraries including Configgy for configuration and logging and Ostrich for statistics collection. I also wrote tests using Specs and found that mocking behavior with mockito was a lot easier than I expected. Basic behavior coverage was particularly useful when I refactored the storage backend, laying the groundwork for pluggable backends and changing the underlying storage mechanism from a List to a HashMap.

    Wrapping up

    Scala’s type checking saved me from doing stupid things several times and I really appreciate the effort put in to the compiler. The error messages and context that I get back from the compiler when I’ve done something wrong are better than any other static language that I can remember.

    I’m glad that I took a closer look at Scala. I still have a lot to learn but it’s been a fun journey so far and it’s been great to get out of my comfort zone. I’m always looking to expand my toolbox and Scala looks like a solid contender for highly concurrent systems.

  • Installing PyLucene on OSX 10.5

    I was pleasantly surprised at my experience installing PyLucene this morning on my OSX 10.5 laptop. The installation instructions worked perfectly without a hiccup. This may not be impressive if you’ve never installed (or attempted to install) PyLucene before.

    I tried once a year or so back and was unsuccessful. The build process just never worked for me and I couldn’t find a binary build that fit my OS + Python version + Java version combination.

    Check out PyLucene:

    $ svn co http://svn.apache.org/repos/asf/lucene/pylucene/trunk pylucene
    

    Build JCC. I install Python packages in my home directory and if you do so too you can omit sudo before the last command, otherwise leave it in:

    $ cd pylucene/jcc
    $ python setup.py build
    $ sudo python setup.py install
    

    Now we need to edit PyLucene’s Makefile to be configured for OSX and Python 2.5. If you use a different setup than the one that ships with OSX 10.5, you’ll have to adjust these parameters to match your setup.

    Edit the Makefile:

    $ cd ..
    $ nano Makefile
    

    Uncomment the 5 lines Below the comment # Mac OS X (Python 2.5, Java 1.5). If you have installed a different version of Python such as 2.6, there should be a combination that works for you. Here’s what I uncommented:

    # Mac OS X  (Python 2.5, Java 1.5)
    PREFIX_PYTHON=/usr
    ANT=ant
    PYTHON=$(PREFIX_PYTHON)/bin/python
    JCC=$(PYTHON) -m jcc --shared
    NUM_FILES=2
    

    Save the file, exit your editor, and build PyLucene:

    $ make
    

    If it doesn’t build properly check the settings in your Makefile.

    After a successful build, install it (again you can omit sudo if you install Python packages locally and not system-wide):

    $ sudo make install
    

    Now verify that it’s been installed:

    $ python
    Python 2.5.1 (r251:54863, Nov 11 2008, 17:46:48)
    [GCC 4.0.1 (Apple Inc. build 5465)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import lucene
    >>>
    

    If it imports without a problem you should have a working PyLucene library. Rejoice.

  • All I want to do is convert my schema!

    I’m working on a django in which I want to store GPS track information in GPX format. The bests way to store that in django is with an XMLField. An XMLField is basically just a TextField with validation via a RELAX NG Compact schema.

    There is a schema for GPX. Great! The schema is an XSD though, but that’s okay, it’s a schema for XML so it should be pretty easy to just convert that to RELAX NG compact, right?

    Wrong.

    I pulled out my handy dandy schema swiss army knife, Trang but was shocked to find out that while it can handle Relax NG (both verbose and compact), DTD, and an XML file as input and even XSD as an output, there was just no way that I was going to be able to coax it to read an XSD. Trang is one of those things (much like Jing that I rely on pretty heavily that hasn’t been updated in years. That scares me a bit, but I keep on using ’em.

    With Trang out of the picture, I struck out with various google searches (which doesn’t happen very often). the conversion section of the RELAX NG website. The first thing that struck my eye was the Sun RELAX NG Converter. Hey, Sun’s got it all figured out. I clicked the link and was somewhat confused when I ended up at their main XML page. I scanned around and even searched the site but was unable to find any useful mention of their converter. A quick google search for sun “relax ng converter” yielded nothing but people talking about how cool it was and a bunch of confused people (just like me) wondering where they could get it.

    At this point I was grasping at straws so I pulled up The Internet Archive version of the extinct Sun RELAX NG Converter page. That tipped me off to the fact that I really needed to start tracking down rngconf.jar. A google search turned up several Xdoclet and Maven cvs repositories. I grabbed a copy of the jar but it wouldn’t work without something called Sun Multi-Schema XML Validator.

    That’s the phrase that pays, folks.

    A search for Sun “Multi-Schema XML Validator” brought me to the java.net project page and included a prominent link to nightly builds of the multi-schema validator as well as nightly builds of rngconv. These nightly builds are a few months old, but I’m not going to pick nits at this point.

    After downloading msv.zip and rngconv.zip and making sure all the jars were in the same directory I had the tools I needed to convert the XSD in hand to RELAX NG Compact. First I converted the XSD to RELAX NG Verbose with the following command: java -jar rngconv.jar gpx.xsd > gpxverbose.rng. That yielded the following RELAX NG (very) Verbose schema. Once I had that I could fall back to trusty Trang to do the rest: trang -I rng -O rnc gpxverbose.rng gpx.rng. It errored out on any(lax:##other) so I removed that bit and tried again. After a lot more work than should have been required, I had my RELAX NG Compact schema for GPX.

    My experience in finding the right tools to convert XSD to RELAX NG was so absurd that I had to write it up, if only to remind myself where to look when I need to do this again in two years.

  • SNAP at Forum Nokia

    A few years ago (mid-2004 I think) I got really excited about SNAP (scalable network application package). I haven’t heard much about it since (not that I had been looking very hard), but I saw it pop up again today at Forum Nokia, and I think it’s worth a fresh look. There’s a flash site explaining the tech, but the really good stuff looks like it just hit Forum Nokia:

    I really wish that I had enough time to read all this stuff, since it’s absolutely fascinating to me. I’ll try to download a couple of these to my 770 and read them when I get a chance. It looks like some really sweet stuff has been done with SNAP while I haven’t been looking. In particular, Nitro Spin Racer reminds me of RC Pro-Am and I so want to play it. There are a couple of other games that look pretty compelling too.

  • Newcomers to the Bookshelf

    Newcomers to my bookshelf

    Several new books have landed on to my bookshelf recently and I thought I’d take a minute to highlight them:

  • You Know Your Programming Language is Complicated When

    Java in a NutshellThis is an exercise best done at a brick and mortar bookstore. First, find yourself the Java section. Then locate the 5th edition of Java in a Nutshell, newly revised for Java 1.5. Take a close look at it. Thick, isn’t it? Now pick it up. Note that it weighs 3.2 pounds. Now thumb through it, all 1264 pages. That’s a bunch of pages.

    With apologies to Jeff Foxworthy, your programming language just might be complicated when you have trouble telling the difference between its Nutshell book and a telephone book.

  • Sun Does XMPP

    Via Jabber News, Yahoo! Finance:

    SANTA CLARA, Calif., March 30 /PRNewswire-FirstCall/ — Sun Microsystems, Inc. (Nasdaq: SUNW – News), today announced the latest version of Sun Java(TM) System Instant Messaging, a key component of Sun Java Communications Suite. With this latest release, Sun is supporting the eXtensible Messaging and Presence Protocol (XMPP), the first protocol to be approved by the Internet Engineering Task Force (IETF) as an Internet standard for instant messaging and presence technologies. In addition, Sun Java System Instant Messaging includes new privacy controls, significant improvements in usability and new partnerships to enhance the offering.

    Rock on Sun! Excellent move! Unfortunately the Sun Java System Instant Messaging page is giving me a really ugly Tomcat 500 error. (Actually, if I had scrolled down to the very bottom rather than searching Google, I would have found a working link. It’s also a bit weird that they’re running Tomcat now that you mention it. Don’t they sell software that does stuff like that?

    All dogfood issues aside, I’m always excited to see XMPP expand its base and make its way in to a new product.

  • Just Browsing: Books that Caught my Eye

    As a break from classwork last night my wife and I headed to the local Borders to do a little book browsing. I didn’t pick anything up, but several titles caught my eye. Here are the books that I would have picked up if money were no issue and there were a few extra hours in each day:

    • Novell Certified Linux Engineer (Novell CLE) Study Guide: I almost went for a cert with the previous SUSE cert system. I also remembered that I’m a Java Certifieid Programmer and would do more Java certs if I had the time. I really wish that there were a J2ME cert book out there that I could study in my downtime.
    • Secure Architectures with OpenBSD: This looked like a meaty book with lots of information on hardening the already paranoid OpenBSD as well as ways to use it without making stupid mistakes.
    • Managing Security with Snort and IDS: There aren’t enough yellow O’Reilly books. Snort has intrigued me for some time and I’d love to read up on it someday.
    • Advanced Unix Programming: I’ve never been a really low-level guy, but I’ve had a newfound respect for plumbing since I’ve been shoving 0’s and 1’s around this semester. This looks like a great reference for low-ish level programming in a Unix (or Unix-like) environment.
    • Knoppix Hacks: I swear, if you leave two Hacks books alone for 20 minutes they’ll mate and have offspring. There really are a lot of things you can do with Knoppix.
    • Essential Mathematics for Games and Interactive Applications : A Programmer’s Guide: This one was showcased a little bit and gets down to the nitty gritty of stuff that you need to do in order to know your stuff. I’m always amazed at how much you need to know about whatever subject you’re coding for.
    • XML Hacks: What did I tell you? There’s another. A bunch of tricks with XML from cool but useless to wow.
    • Python Programming Patterns: I don’t think I’ve seen enterprise-grade patterns using Python before. This looks like a good book for those looking for an excuse to use Python in the workplace.
    • Moleskine by Kikkerland: Some great small notebooks and stuff. They could be great for jotting down notes before they can make their way to my wiki.

    It was great to get out and graze at the bookstore a bit. It has been awhile since I’ve done so. Of course I have a similar number of tech books already on the shelf that I haven’t had a chance to read, but I always want more.

    What books have you looked at lately? I was bummed not to find Mono: A Developer’s Notebook on the shelf, but considering that there were several there last time, I think that’s a good sign.

  • Research and Development

    I was joking with a friend about Fortran Server Pages and how silly that would be the other day. A quick google search didn’t reveal anything, although it did unearth some Fortran CGI from the FCC (with source code). While investigating further, I found myself arriving at the R&D sections of a few different companies. I thought I’d collect my findings for you:

    This isn’t an exhaustive list by any means, but there are a lot of amazing research projects, downloadable software, and amazing papers on the other end of those links. If you have some free time, you should look around a few of them.

  • /me is back.

    It’s been a long couple of months and I apologize for the hiatus. It’s a long story for another day, but lets put it this way, I’m back! I’ve moved from Radio Userland to WordPress. I promise that I’ll share my (semi-painful and procrastination-ridden) migration process in due time.

    The .css that is currently driving the site is Dots by Alex King, which I’m currently tweaking. I’ve still got some random bits that I need to find and url rewrite to fit the new engine, but I’ve done my best to keep the old permalinks. If you find something that’s whacky, please drop me a line at matt at the domain ooiio.com. Thanks!

  • Birthday Book

    My wonderful wife gave me my birthday present early this year: Wireless Java: Developing with J2ME (the second edition of course). It’s on my shortlist of J2ME books that I’ve thumbed through but wanted to have on hand. I’ve thumbed through it a few times but it’s great to have it here on my desk. I’ve got a lot of reading ahead of me!

  • Rendezvous Javadoc

    Someone on the the Rendezvous mailing list pointed out an excellet collection of references for Apple’s Rendezvous API. Of most interest to me is the Rendezvous Javadoc.

  • Eclipse on OSX: Quite Stunning

    Eclipse and Mac OS X: A Natural Combination is a page at Apple Developer Connection geared toward Mac developers that might not be aware of Eclipse.  It sure does look pretty under OS X, although not everything looks like a native widget.  The getting started directions at the bottom of the page should get first time Mac Java developers up and running quickly.

    Between XCode and Eclipse, Mac developers have some very sophisticated development tools available to them free of charge.

  • JXTA 2.3 is Out the Door

    Word from Gonzo Mofo is that JXTA J2SE 2.3 is out the door.  It looks like there are a lot of bugfixes, some deprecations, and some new features in this release.  The JXTA website should update with details soon, but for now check out the Gonzo Mofo link above.

  • J2ME Polish Looks Promising

    The Wireless Development Weblog pointed to an extremely interesting project today: J2ME Polish.  The screenshots look a lot more sexy than plain old MIDP on mobile phones.  I’ll have to read the docs in depth to grok J2ME Polish on a technical level, but on the surface it looks like a very clean and polished (sorry for the pun) project, and might be worth some serious attention by J2ME heads.

  • Jodd

    JSurfer earlier today noted that a new release of Jodd out.  Jodd is a general purpose Java library.  If you’re thinking about reinventing the wheel, it’s probably already been done in Jodd.  Browse through the javadocs for an idea of what you can do with it.  Jodd is distributed under a license that looks like a modified BSD one, so it should be suitable for incorporation to projects with lots of different license types.

  • Java MessagePort Library

    Via freshmeat, the Java MessagePort Library implements a lot of transport methods:

    The Java MessagePort Library is a general abstraction for many different stream- or message-based APIs, including UDP, TCP, JMS, JXTA, BEEP, J2EE MessageBeans, SOAP, Mach IPC, SysV IPC, QNX4 SRR IPC, and shared memory. The available transport encodings include none, RMI, AltRMI, XML-RPC, SOAP, and JRML.

    It is released under the LGPL, whose definition as it pertains to Java code changes almost weekly.  I’ve glanced at the javadocs, and there sure are a lot of transport protocols and message types implemented.  If you’re thinking about reinventing the wheel and LGPL works for you, this looks like an excellent library to make use of.

  • Nokia’s Smart Move: Scalable Network Application Package

    From a Nokia press release (emphasis mine):

    Los Angeles, California. May 12, 2004 – Nokia today premiered the first multiplayer Java games based on its SNAP Mobile solution at this year’s Electronic Entertainment Expo (E3). Developed together with Sega Mobile, the SNAP Mobile demonstration features multiplayer gaming for Java games, in addition to key community features such as friends lists, presence, and instant messaging. The Sega Mobile game demos are the first example of how SNAP Mobile brings the technology utilized in the N-Gage Arena gaming community to mass market Java terminals.

    SNAP is going to blow the lid off of J2ME MIDP 2.0 development.  Not only are mobile game developers going to be able to easily create several different types of multiplayer networked games (ranking, freestyle, challenge, etc), but we’re going to be able to take advantage of all of the extra bits that come with the platform.

    Here’s another choice snippet (emphasis mine again):

    The first demonstrations of SNAP Mobile will be implemented in MIDP 2.0 on Series 60, and support for other platforms will follow. The SNAP Mobile client development kit is expected to be made available for Java game developers free of charge in the third quarter of 2004. The server components can be licensed by mobile operators and other interested service providers or they can opt for a hosted community service. With either option, service providers will be able to create, build and brand their mobile gaming communities and drive data revenues and customer loyalty.

    Hey Nokia: Smartest. Move. Evar!  It sounds like they’re going to release the libraries and API docs on Forum Nokia and let ideas flourish rather than try to keep them locked down.  It’s a perfect strategy too: a small mobile gaming startup creates a killer game.  It includes some multiplayer aspects that require the Nokia server components.  All of a sudden they’ve got to license the server side stuff directly from Nokia or make a deal with a carrier.  Everyone is free to create a killer game, but if you need the server side stuff, Nokia’s going to be making a buck one way or another.

    I look forward to taking a look at SNAP sometime in Q3.  I know that it has me excited, and I’m pretty sure that Nokia has the ear of a lot of J2ME developers.

    Update:

    Note above that they are planning to roll out SNAP first on Series 60 MIDP 2.0.  That means that they see the big picture.  If they were planning on keeping SNAP on the N-Gage they would have not mentioned which platform they would deploy on first.  What does this mean?  It means that when released, SNAP should work on pretty much any S60 that has MIDP 2.0.  That means that a 6600 or a 7610 should be able to run SNAP out of the box.

    But wait, does that mean that initial versions of SNAP will not work on the N-Gage or N-Gage QD?  Yep.  They’re MIDP 1.0, which lacks things like a Bluetooth API, decent socket support, and has a whole lot of limitations that are going to make backporting SNAP a pain in the butt, if not impossible if they want to keep it feature complete.  Of course the Bluetooth API is optional, but can be found in all Series 60 MIDP 2.0 devices to date.

    Does anyone else smell an N-Gage 2 featuring S60 v2.0 and MIDP 2.0 coming in the next few months?  I think I do.  Jim at All About N-Gage thinks it might be a possibility too.

  • KVMJab: Still Kicking!

    If you find or hear about an open source J2ME/MIDP library or app, it tends to either be extremely out of date (from 2001 and designed for early MIDP1.0) or so bleeding edge that it doesn’t work on most devices.  I was pleasantly suprised to find KVMJab, a Jabber library for MIDP 1.0 (though it should work fine with MIDP 2.0), alive and kicking.  There is a new release of the source code just a few days old that is updated to work with Sun’s Wireless Toolkit v2.1.  It looks like much of the source has not been touched since late 2000 or early 2001, but if it works, it works.