A Very Simple CamelCase Parser in Python


Posted

in


In playing around with regular expressions in Python, I came up with the following very simple CamelCase parser.  I really like this style of writing out regex.  It’s much more readable than the typical compacted regex that I am used to seeing. simplecamelcase.py:


# simplecamelcase.py - a really simplistic CamelCase parser
import re
pattern = re.compile(r'''
    (?x)(   # Begin group
    \b      # word boundry
    [A-Z]   # Find an upper case letter
    (\S*?)  # consume non whitespace
    [A-Z]   # Find a second upper case letter
    (\S*?)  # consume more whitespace
    \b      # end word boundry
    )       # end group, repeat as neccesary
    ''')
testString = "This is a TestCase of a VerySimple CamelCaseParser."
find_camel = lambda s: [u[0] for u in re.findall(pattern, s)]
print find_camel(testString)
# Prints ['TestCase', 'VerySimple', 'CamelCaseParser']

I have found that Python is a pleasure to putz around with pretty much everything, and regexes are no exception.  You can find more information at Kuchling’s Regular Expression HOWTO and Chapter 3 of David Mertz’ Text Processing in Python.  Both are well worth reading.

The above code is extremely naive, and of course use it at your own risk.  It would be trivial to modify this code to use re.sub in order to create a very naive wiki parser.  That might be fun.