If you’re doing what I’m doing, and need to parse an RSS feed that has lots of fun little tags in other namespaces you want to slurp up along with all the normal things, here’s something you can do.
We’re going to use the example I’ve been working on, a) because it allows me to point out an interesting problem, b) because it allows me to brag about what I’m working on, and c) because at this point I’m too tired to think through the logic of making an example work.
I’m writing a command line based feedreader called bulletin in Ruby. bulletin uses NewsGator to sync online feeds. Here’s an enticing, exciting pre-release preview screenshot!

In any event, there’s lots of cool metadata in NewsGator’s RSS feeds. The one piece I was interested in was whether or not an item in a feed has been read by the user. It appears in the feed as this element:
<ng:read>True</ng:read>
Awesome. So how do we go about getting this item and parsing it like it ain’t no thang? By extending Ruby’s RSS parser, like so.
First, we extend the Item class for RSS feed items to add an extra attribute:
module RSS; class Rss; class Channel; class Item install_text_element "ng:read", "http://newsgator.com/schema/extensions", '?', "read", :boolean, "ng:read" end; end; end; end
Here’s what this means: We want a new element, that looks like ng:read. It comes from this schema: http://newsgator.com/schema/extensions. We don’t know where it will show up in the parsing of an item (?). The name of the attribute we will access it with is read. It’s a :boolean type. If we write an RSS feed back out, it will appear as ng:read in that feed.
That is, I think that’s all true. This is a lot of experimenting and diving through source.
Next, we tell the parser to look for another element:
RSS::BaseListener.install_get_text_element "http://newsgator.com/schema/extensions", "read", "read="
This says: Install this element into the parser. It comes from this schema: http://newsgator.com/schema/extensions. Its accessor method is read. It’s setter method is read=.
And then you’re good! Well, except for one thing.
The name of this particular element, less its namespace, is read. The Listener needs to know what to call its accessor and setter methods. That means some reflection magic is being done behind the curtains. Yes! So now you have to be extra careful with this particular Item, because now its original read method has been overwritten. All three times we have a parameter up there with read have to be the same. I haven’t gotten it to work any other way.
The implications:
- I haven’t found a way to give an element accessors and getters that are not its element name without the namespace.
- Printing the item back out with
to_sdoesn’t appear to bring the new element with it, although from the looks of it my method above doesn’t provide for that no matter what the element is named.
I’d love to talk to someone who knows the internals a bit more — or at least someone who could help me write some documentation for Ruby’s RSS parser. This is a pretty important thing and it would be awesomely useful.
In the meantime, have fun with your newfound knowledge! We now have an Item#read method that gives us true or false, depending on what was parsed.
Let me know if you make any progress in figuring this beast out.
5 Comments
Awesome, I was looking for something like this recently…
One question though… why do module RSS; class Rss; class Channel; class Item … end;end;end;end versus class RSS::Rss::Channel::Item … end ?
No particular reason :-) Sometimes when I don’t trust the code I’m writing I expand all the little shortcuts to make sure the problem isn’t something trivial.
I find the Ruby RSS parser a little crazy to work with if your using multiple feeds from different sources. I ended up moving towards Feed Normalizer: http://feed-normalizer.rubyforge.org/
Do you really need to include the namespace in the install_text_element call? I’ve not used the library in particular but that’s pretty bad “XML” practice in general; the identifier used for the namespace can change arbitrarily and still have the same semantics.
I realise it won’t, but it’s probably something to watch out for. :)
Calum -
I have absolutely no idea whether or not removing the namespace from that call continues to allow this method to work. In other words, try it and see!
Also, great gravatar! :-)
One Trackback
[...] no doubt remember my fabulous, in-development command-line NewsGator client, bulletin. I’ve extracted the NewsGator-specific code to a gem called WonderCroc. The README provides a [...]