Friday, June 22, 2007

Mastering PHP's XML Parser

Probably not the most exciting subject for anyone else, but I've spent a lot of hours in the last couple days writing a script to automatically read ATOM feeds into a webpage. And yes, I'm aware that such things already exist, and can be found easily on Sourceforge, but doing it myself helped get me that much deeper into working with PHP, and made sure that I understood both the ATOM schema and PHP's XML Parser.

The XML Parser requires several functions to be put into it in order to do whatever it is you want it to do. You need a function to deal with start tags, one for end tags, and one for character data between those tags. The only place you can deal with attributes is within the start tag function. It helps, then, to have some global variables to store the information you desire to pluck out of an XML document. This sounded like something suited for an object-oriented approach, so I started by building a class called ATOMParser. I made the start, end and character functions internal, and created two public functions, one to parse a document, the other to set how many results to return and which label to search for.

Here entered a wrinkle; on my server, where I did much of my testing, I run PHP 4.something, whereas the university server has PHP 5.something. PHP 5 has a whole bunch of reserved words like public, private and interface, whereas PHP 4 does not. To make things work on both ends, I had to go generic with my terms. It all worked fine, but theoretically, someone could call the start or end functions from outside the ATOMParser object... for what little good it would do.

Another difference: on my server, I could simple open the XML straight away, whereas on the URI server it was necessary to stream the document in chunks, then open it. What does this mean? Well, using code that worked on my server to display all the different posts in the blog only showed the latest post (maybe the latest two, if I refreshed just right) on the URI server. The packeting fixed that.

So, my end result is a pretty portable piece of PHP that I can use elsewhere on the GSLIS site. You can see its current implementation at www.uri.edu/artsci/lsc. More will follow, notably the joblist. I will also likely use this somewhere in my website for bringing in blog info (same blog package as GSLIS). For other XML documents, I can now create custom parsers to do my bidding. Very exciting.

Well, to me, at least.

No comments: