- From: Toby A Inkster <tai@g5n.co.uk>
- Date: Tue, 13 Jan 2009 10:54:00 +0000
- To: RDFa <public-rdf-in-xhtml-tf@w3.org>, public-rdfa@w3.org
I've recently implemented support for this in Swignition <http:// buzzword.org.uk/swignition/> and thought I'd share my technique. First I use Raptor <http://librdf.org/raptor/> to parse the feed. This results in a graph (which we'll call "G") including a number of resources with rdf:type <http://purl.org/rss/1.0/item>. I loop through these resources, and for each resource (which we'll call "R"): 1. If R does not have a content:encoded predicate, ignore it and go on to the next resource. Note that the full URI for content:encoded is <http://purl.org/rss/1.0/modules/content/encoded>, but some versions of Raptor erroneously use <http://web.resource.org/rss/1.0/ modules/content/encoded>, so you should check both. (I have different versions of librdf on my laptop and desktop, so come across this sort of thing all the time!) 2. Concatenate "<html>" then the content:encoded literal (hopefully there will be only one) then "</html>". Pass this through a tag soup HTML to valid XHTML conversion routine. 3. Parse the XHTML as RDFa with a base URI equal to R's URI. This results in a graph "H". 4. Merge the triples from graph H into graph G taking care not to confuse similarly-named blank nodes. (i.e. if G contains a node _:Foo and H also contains a node _:Foo, then these should not be treated as the same node in the merged graph.) In the end, all the data are belong to G. Open question: should XML namespaces used in the Feed be "inherited" as CURIE prefixes within the XHTML parsed in the step labelled "3"? I can see arguments either way. Overall, I feel that they should not. -- Toby A Inkster <mailto:mail@tobyinkster.co.uk> <http://tobyinkster.co.uk>
Received on Tuesday, 13 January 2009 10:54:51 UTC