- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Mon, 9 Jun 2008 17:23:25 +0000
- To: Manu Sporny <msporny@digitalbazaar.com>
- CC: RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>, Dave Beckett <dave@dajobe.org>
The input from PyRDFa is RDF/XML with a parseType literal. That gets XML Exclusive Canonicalization applied (required of RDF/XML parsers - parseType literal does not exist in N3/Turtle/N-Triples, only datatype XMLLiteral and parsers for those serializations do not canonicalize). http://www.w3.org/TR/rdf-syntax-grammar/#section-grammar-productions 7.2.17 - bullet point 2. http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-XML-literals This means two changes are made to what is sent over the wire: 1/ Unused namespaces are removed (e.g on <strong ..>, there was a xmlns:svg) 2/ <svg:rect/> is replaced by <svg:rect></svg:rect> You can see what ends up in the store by asking: SELECT ?o { ?s ?p ?o } which is on that PyRDFa's output: http://www.sparql.org/sparql?query=SELECT+%3Fo+%7B+%3Fs+%3Fp+%3Fo%7D&default-graph-uri=http%3A%2F%2Fwww.w3.org%2F2007%2F08%2FpyRdfa%2Fextract%3Furi%3Dhttp%3A%2F%2Fwww.w3.org%2F2006%2F07%2FSWD%2FRDFa%2Ftestsuite%2Fxhtml1-testcases%2F0100.xhtml&stylesheet=%2Fxml-to-html.xsl I applied the same canonicalization to the query (canonicalization of the object in the query): ASK WHERE { <http://www.example.org> <http://example.org/rdf/example> "Some text here in <strong xmlns=\"http://www.w3.org/1999/xhtml\">bold</strong> and an svg rectangle: <svg:svg xmlns:svg=\"http://www.w3.org/2000/svg\"><svg:rect svg:height=\"100\" svg:width=\"200\"></svg:rect></svg:svg>"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral> . } which is the query and datasource: http://www.sparql.org/sparql?query=ASK+WHERE+%7B%0D%0A%3Chttp%3A%2F%2Fwww.example.org%3E+%3Chttp%3A%2F%2Fexample.org%2Frdf%2Fexample%3E+%22Some+text+here+in+%3Cstrong+xmlns%3D%5C%22http%3A%2F%2Fwww.w3.org%2F1999%2Fxhtml%5C%22%3Ebold%3C%2Fstrong%3E+and+an+svg+rectangle%3A+%3Csvg%3Asvg+xmlns%3Asvg%3D%5C%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%5C%22%3E%3Csvg%3Arect+svg%3Aheight%3D%5C%22100%5C%22+svg%3Awidth%3D%5C%22200%5C%22%3E%3C%2Fsvg%3Arect%3E%3C%2Fsvg%3Asvg%3E%22%5E%5E%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23XMLLiteral%3E%0D%0A+.%0D%0A%7D%0D%0A&default-graph-uri=http%3A%2F%2Fwww.w3.org%2F2007%2F08%2FpyRdfa%2Fextract%3Furi%3Dhttp%3A%2F%2Fwww.w3.org%2F2006%2F07%2FSWD%2FRDFa%2Ftestsuite%2Fxhtml1-testcases%2F0100.xhtml&stylesheet=%2Fxml-to-html.xsl and I get "true" SPARQL does not require canonicalization. It is possible to get non-canonicalized XML literals into the data by using datatype XMLLiteral even in RDF/XML. Andy > -----Original Message----- > From: Manu Sporny [mailto:msporny@digitalbazaar.com] > Sent: 9 June 2008 15:32 > To: Seaborne, Andy > Cc: RDFa mailing list; Dave Beckett > Subject: Re: Issue with Jena/sparql.org and XML Literals? > > Seaborne, Andy wrote: > > You have to be carefule with line endings as well - \n vs \n\r etc. > > The SPARQL parser does not canonicalize XMLLiterals in the query. > > There are no \n or \n\r in either the input RDF or the SPARQL, so that > shouldn't be an issue. Thanks for mentioning that, however. > > > 2008Jun/0027.html ==> > > [[ > > If you look at librdfa's output for TC100: > > > http://rdfa.digitalbazaar.com/librdfa/rdfa2rdf.py?uri=http://www.w3.org/20 > 06/07/SWD/RDFa/testsuite/xhtml1-testcases/0100.xhtml > > > > and PyRDFa's output for TC100: > > > http://www.w3.org/2007/08/pyRdfa/extract?uri=http://www.w3.org/2006/07/SWD > /RDFa/testsuite/xhtml1-testcases/0100.xhtml > > ]] > > > > If I understand these correctly, these are different. > > Yes, the XML Literals that are generated are different and the SPARQL > tests two "valid" XML Literals. The first test in the SPARQL will match > librdfa's output, the second should match PyRDFa's output. It is the > second SPARQL test (the last part of the UNION) that is failing for some > unknown reason. > > > Running CURL on the first I get data with multiple namespaces on each > > element, and I don't on the second. > > On the second one (PyRDFa's output), you should get two namespaces, the > standard XHTML one and the standard SVG one. This is the expected > behavior and I believe the SPARQL is setup to test exactly that. > > > N-Triples files for each attached (rdfparse run on the CURL results of > > each link.. You will see they both have XMLLiterals but are different > sizes. > > Yup, that is expected. The SPARQL test has two variations that are > valid... the second variation should be passing, but it doesn't. > > -- manu > > -- > Manu Sporny > President/CEO - Digital Bazaar, Inc. > blog: Dynamic Spectrum Auctions and Digital Marketplaces > http://blog.digitalbazaar.com/2008/04/24/dynamic-spectrum-auctions/
Received on Monday, 9 June 2008 17:24:30 UTC