Normalizing Syndicated Feed Content

Article by Mark Pilgrim at xml.com, provides XPaths to the core data in 
RSS x.x and Atom 0.3

http://www.xml.com/pub/a/2004/04/07/dive.html

---
My comments : great stuff, but...

As Mark says in the piece:
"RSS 1.0 has an additional RDF-based model for rich content not listed 
above, described in the mod_content specification 
<http://purl.org/rss/1.0/modules/content/#syntax>. It is beyond my 
ability to describe it in XPath, if indeed it is possible at all."

It is possible to normalise RSS x.x and Atom *to* RSS 1.0 (RDF) and 
retain the rich content/extension metadata. This could be done with the 
XPaths, possibly in an XSLT style sheet.

Here are stylesheets that (I think) only support RSS x.x so far:

http://purl.org/net/syndication/subscribe/feed-rss1.0.xsl
(Morten Frederiksen)
http://w3future.com/weblog/2002/09/09.xml
(Sjoerd Visscher)

Note that you can do the same trick with OPML2OCS:
http://purl.org/net/syndication/subscribe/list-ocs0.5.xsl
(Morten Frederiksen)

It is possible to add extensions to RSS 2.0, though there isn't any 
systematic way described in the spec (beyond 'use XML namespaces'), see 
my own bit at xml.com: http://www.xml.com/pub/a/2003/07/23/extendingrss.html

Of course this all assumes that you have something in your app capable 
of dealing with rich content/metadata extension. If you use an RDF API 
in your system, you get much of that support out of the box, and can 
parse the (enriched) RSS 1.0 directly, and for that matter OCS and FOAF 
as well.

I believe that Atom should be soundly extensible, and I don't think I'm 
in a minority on that.

Cheers,
Danny.

-- 
----
Raw
http://dannyayers.com

Received on Thursday, 8 April 2004 04:12:01 UTC