Fixing RSS 1.0 content

The ugliness of escaped content in XML has been known for a while
around RSS. Atom (RFC 4287 [1]) allows you to use full
(namespace-qualified) XHTML content.

Norm Walsh has declared [2] he's going kill off his RSS feeds leaving
only Atom, which doesn't seem an unreasonable course of action under
the circumstances. However, unlike RSS 1.0, Atom isn't RDF/XML, so
there's a hunk of baby going with this bathwater.

As Dan Connolly suggests [3], there should be a way of fixing RSS/RDF
content through the use of rdf:parseType="Literal". I believe this is
the approach taken with RSS 1.1, but that specification has (so far)
failed to get significant adoption, so a tweak of RSS 1.0 would seem
preferable. I can't see a perfect solution, but here are some options
-

One way would be to add this attribute to the content:encoded element
(as used in RSS) and keep the content as XHTML. However the very
definition of the property is that the markup is escaped, so this
seems a little perverse. It may make pragmatic sense : the addition of
the attribute is unlikely to cause too many problems with existing
aggregators/newsreaders, they're liberal and usually ignore unknown
elements/attributes.

Another possibility would be to define a new property, something like
content:literal, to use in place of (or in addition to)
content:encoded, and again use rdf:parseType="Literal".

One final approach that comes to mind is to take advantage of the
(in-progress) mapping of Atom to RDF/OWL [4]. Unfortunately the
content element is one place at which Atom syntax can't trivially be
read as RDF/XML, for example this is what it looks like in one of
Norm's entries:
<content type="xhtml" xml:base="http://norman.walsh.name/2006/02/01/rssrip">
    <div xmlns="http://www.w3.org/1999/xhtml">
    ...
   </div>
<content>

Here's how it looks with the current Atom/OWL mapping, as an entry in
a feed (cut down for clarity) :

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns="http://purl.org/rss/1.0/"
         xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:awol="http://www.w3.org/2005/10/23/Atom#"
>

   <item rdf:about="http://norman.walsh.name/2006/02/01/docbook50b3">
      <title>DocBook V5.0b3</title>
      <awol:content
            xml:base="http://norman.walsh.name/2006/02/01/docbook50b3"
            rdf:parseType="Resource">
      <awol:type>xhtml</awol:type>
      <rdf:value rdf:parseType="Literal">
         <div xmlns="http://www.w3.org/1999/xhtml">
               <p id="p1">This release includes the changes <a
href="http://docbook.org/minutes/2006-01-18.txt"
shape="rect">agreed</a> at the 18 Jan 2006 meeting.</p>
         </div>
         </rdf:value>
      </awol:content>
   </item>
</rdf:RDF>

I've used the awol prefix to avoid confusion with the Atom namespace
proper, which  is the same string with the trailing #.

(There's an unfinished atom2rdfxml.xsl with other bits and pieces at [5]).

Cheers,
Danny.

[1] http://www.ietf.org/rfc/rfc4287
[2] http://norman.walsh.name/2006/02/01/rssrip
[3] http://dig.csail.mit.edu/breadcrumbs/node/78
[4] http://atomowl.org
[5] http://pragmatron.org/trac/browser/pragmatron/atom-owl/


--

http://dannyayers.com

Received on Friday, 3 February 2006 10:02:13 UTC