I’ve been seeking out interesting data sources to plot in Google Earth after learning the basics of KML. I’ve been wanting to do something cool with NOAA’s XML weather feeds since I heard about them, so I thought I would download the 700kb list of stations serving up XML and spit out some KML from that data as a “neat” first step.
I’ll probably still do that, but after parsing the data, I’m a bit dissapointed. As always there are huge gaps in geolocation information. In order to get my hands on the data I turned to xmltramp which is an awesome library for accessing simple XML documents in a pythonic way. I then whipped up a few lines of Python to walk through the data:
import xmltramp # http://www.aaronsw.com/2002/xmltramp/ f=open('stations.xml', 'r') doc=xmltramp.parse(f.read()) count = 0 total = 0 for station in doc['station':]: total = total + 1 sid = str(station['station_id']) lat = str(station['latitude']) lon = str(station['longitude']) if (lat != 'NA') and (lon != 'NA'): print "Station ID: " + sid + \ " (" + lat + "," + lon + ")" count = count + 1 print str(count) + " out of " + str(total) + \ " stations are geolocated."
Here’s the output of the above code:
mcroydon@mobilematt:~/py/kmlist$ python kmlist.py Station ID: PAGM (63.46N,171.44W) [... snip ...] Station ID: KSHR (44.46.10N,106.58.08W) 422 out of 1775 stations are geolocated.
Well that’s a bummer. 422 out of 1775, or less than 25% of all stations are geolocated. While that’s still 422 more stations than I knew about previously, it’s a far cry from a majority of weather stations across the United States.
Another thing you will notice is that some stations appear to be expressed in degrees in decimal form (63.46N) while others appear to use Degrees/Minutes/Seconds (44.46.10N).
It’s gaps like these that can make working with “found” geolocation data frustrating.