- From: Dave Beckett <dave@dajobe.org>
- Date: Tue, 06 Feb 2007 23:42:25 -0800
- To: public-grddl-comments@w3.org
- Message-ID: <45C982E1.3080607@dajobe.org>
Hi GRDDLers Raptor ( http://librdf.org/raptor/ ) has had simple GRDDL support for some time, but it never did recursion following profile or namespace URIs, to look for profiles and triples indirectly. In version 1.4.14 I finally added those features with some features for managing web URI retrieval. Thus raptor is closer - I wan't call it complete - to implementing GRDDL as specified. Raptor implements (mostly) http://www.w3.org/TR/2006/WD-grddl-20061024/ W3C Working Draft 24 October 2006 (I ignored any WG documents in progress) Using the GRDDL WD sections as a guide to remind me of comments 1. Introduction 2. Adding GRDDL to well-formed XML Raptor has implemented xmlns:data-view and data-view:transformation on the root element for some time. The current WD mentions having non-XML results here such as Turtle . with issue-output-formats This was tricky to deal with in XSLT and in the tests because the XSLT environment does not always return an output mime type, so I had to make Raptor do more guessing of the content in order to determine which parser to use, defaulting to RDF/XML if the information doesn't indicate otherwise. 3. GRDDL for XML Namespaces Documents This is newly implemented by me in Raptor 1.4.14. The two methods are: 1) " if an information resource ?D has an XML representation whose root element has a namespace name ?NS then any GRDDL result of the resource identified by ?NS is a GRDDL result of ?D" OK. But is it not really saying that : any GRDDL result of the resource identified by ?NS is *included* in the GRDDL result of ?D i.e. in a set-of-triples inclusion sense. 2) " if an information resource ?D has an XML representation whose root element has a namespace name ?NSDOC** and ?D has a GRDDL result that includes, for any ?TX, the RDF triple { ?NSDOC <http://www.w3.org/2003/g/data-view#namespaceTransformation> ?TX } then ?TX is also a transformation of ?D" So 2) builds on 1) since "?D has a GRDDL result" is what 1) defines. Does that not imply 2) needs to be done after 1) ? There is also some new terminology that's introduced: - GRDDL result of the resource identified by ?NS - a resource identified by ?NS ... is a GRDDL result of ?D - [a resource?] ?TX is .. a transformation of [a resource] ?D GRDDL result is special enough to deserve defining. and "a transformation of" is something that could be expanded a bit. I hard-coded not traversing the following commonly-seen namespace URIs which have no GRDDL right now, so it's wasted retrievals: http://www.w3.org/1999/xhtml http://www.w3.org/1999/02/22-rdf-syntax-ns# http://www.w3.org/2001/XMLSchema It might be questionable whether I should have included the RDF namespace, but I know right now it has no GRDDL. issue-mt-ns mentions this. Is it legitimate to exclude some namespaces forever? 4. The GRDDL profile for XHTML This was implemented by earlier Raptor versions. It might be worth repeating that the <head profile="..."> is a space-separated list of URIs and to look for the GRDDL profile you need to find it in that list. 5. GRDDL for HTML Profiles One issue I found causing problems was whether to traverse the GRDDL profile URI itself, http://www.w3.org/2003/g/data-view In the end I had to exclude it because the GRDDL profile document http://www.w3.org/2003/g/data-view contains an erdf profile, which refers to the GRDDL profile, so when you follow the natural evaluation order, you end up in a loop, or if like me, you were checking for urls already visited, the process terminated without having generated any triples at all. i.e. http://www.w3.org/2003/g/data-view contains: <html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://www.w3.org/2003/g/data-view http://purl.org/NET/erdf/profile"> ... and http://purl.org/NET/erdf/profile => http://research.talis.com/2005/erdf/profile contains: <head profile="http://www.w3.org/2003/g/data-view"> So for me, GRDDLing through the GRDDL profile URI does not work. Calling it direct - as the first URI - does work $ rapper -i grddl -c http://www.w3.org/2003/g/data-view rapper: Parsing URI http://www.w3.org/2003/g/data-view rapper: Parsing returned 40 triples 6. GRDDL Transformations Raptor will do XSLT1 only for the forseable future since it depends on libxslt. 7. Security Considerations This section should go beyond just XSLT issues and discuss - how GRDDL can cause the retrieval of many URIs - consideration of the rate of retrieval - what to do when you see the same URI twice - maybe suggest caching documents 8. The GRDDL Vocabulary 9. References Order of Operation Apart from getting the recursion mechanism implemented, it was tricky figure out what *order* to do some operations. The order I use is: 1) root element namespace 2) head profile 3) other in-document URIs (rel=transform, data-view: ...) Namespace/Profile transformation Triples Do the triples mentioned in the formal descriptions get included into the "GRDDL result of ?x" being calculated? In Raptor they do. Base URIs Several of the XSLT sheets used in the testsa assume that there is an XSLT parameter called 'base' or 'Base' set to the base URI of the document. Otherwise tests fail: These are the ones with the assumption: http://www.w3.org/2000/07/uri43/uri.xsl http://www.w3.org/2000/08/w3c-synd/home2rss.xsl I saw #base-param is still under discussion in http://lists.w3.org/Archives/Public/public-grddl-wg/2007Jan/0059.html but I really don't understand that proposal. Several of the examples assume something to do with base param and/or base URIs in the sheets above and others. Well known Transforms I've also got some hard-coded XPaths in Raptor to find microformats in XHTML just by recognising the css class names and then using a "well-known" transformation. I have disabled them now and probably will remove them from the code entirely DC: doesn't work, namespaces are wrong in the XSLT XPath: /html:html/html:head/html:link[@href="http://purl.org/dc/elements/1.1/"] XSLT: http://www.w3.org/2000/06/dc-extract/dc-extract.xsl eRDF XPath: /html:html/html:head[contains(@profile,"http://purl.org/NET/erdf/profile")] XSLT: http://purl.org/NET/erdf/extract-rdf.xsl hCalendar XPath: //*[@class="vevent"] XSLT: http://www.w3.org/2002/12/cal/glean-hcal.xsl GRDDL Tests http://www.w3.org/2001/sw/grddl-wg/ I was using http://www.w3.org/2001/sw/grddl-wg/td/ taken from the web rather than mercurial (despite the directory names below). It was a bit of a fuss to get the tests setup working as some parts use the swap python, the testtf uses rdflib python and 4suite (I didn't bother with installing that). $ PYTHONPATH=$HOME/lib/python2.4/site-packages python testft.py --run 'rapper -i grddl -q -o rdfxml' testlist1.rdf > raptor_earl.rdf rapper: Error - URI file:///home/dajobe/dev/rdf/grddl/homer.w3.org:8123/atom-grddl.xml:2 - XML parser error - Document is empty rapper: Failed to parse URI file:///home/dajobe/dev/rdf/grddl/homer.w3.org%3A8123/atom-grddl.xml grddl content * file:///home/dajobe/dev/rdf/grddl/homer.w3.org%3A8123/testlist1.rdf#atomttl1 failed * file:///home/dajobe/dev/rdf/grddl/homer.w3.org%3A8123/testlist1.rdf#base-param failed $ raptor_earl.rdf is attached Test failures 1) atomttl1 I haven't figured this one out: $ rapper -i grddl -q -o rdfxml atom-grddl.xml rapper: Error - URI file:///home/dajobe/dev/rdf/grddl/homer.w3.org:8123/atom-grddl.xml:2 - XML parser error - Document is empty <?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> </rdf:RDF> 2) base-param $ rapper -i grddl -q -o rdfxml baseURI.html <?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about=""> <dc:title>Input for Base Param Test Case</dc:title> </rdf:Description> </rdf:RDF> The test suite expects 2 triples, I return 1. The test expected result adds: <ex:StyleSheet rdf:about="baseURI.xsl"/> but I don't see where that's from. I tried the testlist2.rdf set but they all fail except for nmg-strawman# That's all folks Cheers Dave
Attachments
- application/rdf+xml attachment: raptor_earl.rdf
Received on Wednesday, 7 February 2007 07:42:45 UTC