From Broken HTML to XML Data (Wednesday, December 3, 2008)

Being a tech junkie, I have a small addiction to data feeds. One particular set of data that interests me is weather data. The main National Weather Service station near me is at Dulles Airport, which is close enough for forecasts, but not close enough for actual data, like temperature, humidity, and wind. Don't ask why I need that data. It's the addiction. Anyway, I was real close to buying a weather station and hooking it up to my computer, after all, many years ago in school, I managed a weather station for the geoscience lab. And then it occurred to me... the neighborhood schools all have weather stations, just like the one I installed, with their data feeds available online. At least I hoped they were.

It turns out that the data is available. Unfortunately, it's an HTML page, not an XML data feed. And it's not even standard HTML, but horrible broken, non-compliant HTML, that uses tables, line breaks, and bold elements for data layout. Nevertheless, using the DOM API in PHP, I was able to parse out the page, and convert it, on demand, to an XML data feed. The program actually collects all the data from the HTML into a programmatically accessible data structure — the XML page is just a handy way of displaying it.

