Bulletin Progress!

Code Projects — Tags: , , , — Ardekantur @ 8:04 pm

I’m sure you’re all thrilled to know that I’ve made a little bit of progress on Bulletin, my command line based RSS reader. This is all still building-block stuff, but now Bulletin properly parses and displays your NewsGator folder hierarchy, like mine, below:

A screenshot of Bulletin showing indented NewsGator folders.

Hooray! As usual, you can keep track of Bulletin’s progress at its GitHub repository.

Extending Ruby’s RSS Parser

If you’re doing what I’m doing, and need to parse an RSS feed that has lots of fun little tags in other namespaces you want to slurp up along with all the normal things, here’s something you can do.

We’re going to use the example I’ve been working on, a) because it allows me to point out an interesting problem, b) because it allows me to brag about what I’m working on, and c) because at this point I’m too tired to think through the logic of making an example work.

I’m writing a command line based feedreader called bulletin in Ruby. bulletin uses NewsGator to sync online feeds. Here’s an enticing, exciting pre-release preview screenshot!

a tiny screenshot of bulletin, the Ruby RSS Feed Reeder for Linux

In any event, there’s lots of cool metadata in NewsGator’s RSS feeds. The one piece I was interested in was whether or not an item in a feed has been read by the user. It appears in the feed as this element:

<ng:read>True</ng:read>

Awesome. So how do we go about getting this item and parsing it like it ain’t no thang? By extending Ruby’s RSS parser, like so.

First, we extend the Item class for RSS feed items to add an extra attribute:

module RSS; class Rss; class Channel; class Item
  install_text_element "ng:read", "http://newsgator.com/schema/extensions", '?', "read", :boolean, "ng:read"
end; end; end; end

Here’s what this means: We want a new element, that looks like ng:read. It comes from this schema: http://newsgator.com/schema/extensions. We don’t know where it will show up in the parsing of an item (?). The name of the attribute we will access it with is read. It’s a :boolean type. If we write an RSS feed back out, it will appear as ng:read in that feed.

That is, I think that’s all true. This is a lot of experimenting and diving through source.

Next, we tell the parser to look for another element:

RSS::BaseListener.install_get_text_element "http://newsgator.com/schema/extensions", "read", "read="

This says: Install this element into the parser. It comes from this schema: http://newsgator.com/schema/extensions. Its accessor method is read. It’s setter method is read=.

And then you’re good! Well, except for one thing.

The name of this particular element, less its namespace, is read. The Listener needs to know what to call its accessor and setter methods. That means some reflection magic is being done behind the curtains. Yes! So now you have to be extra careful with this particular Item, because now its original read method has been overwritten. All three times we have a parameter up there with read have to be the same. I haven’t gotten it to work any other way.

The implications:

  • I haven’t found a way to give an element accessors and getters that are not its element name without the namespace.
  • Printing the item back out with to_s doesn’t appear to bring the new element with it, although from the looks of it my method above doesn’t provide for that no matter what the element is named.

I’d love to talk to someone who knows the internals a bit more — or at least someone who could help me write some documentation for Ruby’s RSS parser. This is a pretty important thing and it would be awesomely useful.

In the meantime, have fun with your newfound knowledge! We now have an Item#read method that gives us true or false, depending on what was parsed.

Let me know if you make any progress in figuring this beast out.

This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.
(c) 2008 Ardekantur | powered by WordPress with Barecity