RE: Issue with Jena/sparql.org and XML Literals?

The input from PyRDFa is RDF/XML with a parseType literal. That gets XML Exclusive Canonicalization applied (required of RDF/XML parsers - parseType literal does not exist in N3/Turtle/N-Triples, only
datatype XMLLiteral and parsers for those serializations do not canonicalize).

http://www.w3.org/TR/rdf-syntax-grammar/#section-grammar-productions

7.2.17 - bullet point 2.

http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-XML-literals


This means two changes are made to what is sent over the wire:

1/ Unused namespaces are removed (e.g on <strong ..>, there was a xmlns:svg)
2/ <svg:rect/> is replaced by <svg:rect></svg:rect>

You can see what ends up in the store by asking:

SELECT ?o { ?s ?p ?o }

which is on that PyRDFa's output:

http://www.sparql.org/sparql?query=SELECT+%3Fo+%7B+%3Fs+%3Fp+%3Fo%7D&default-graph-uri=http%3A%2F%2Fwww.w3.org%2F2007%2F08%2FpyRdfa%2Fextract%3Furi%3Dhttp%3A%2F%2Fwww.w3.org%2F2006%2F07%2FSWD%2FRDFa%2Ftestsuite%2Fxhtml1-testcases%2F0100.xhtml&stylesheet=%2Fxml-to-html.xsl



I applied the same canonicalization to the query (canonicalization of the object in the query):

ASK WHERE {
<http://www.example.org> <http://example.org/rdf/example> "Some text here in <strong xmlns=\"http://www.w3.org/1999/xhtml\">bold</strong> and an svg rectangle: <svg:svg xmlns:svg=\"http://www.w3.org/2000/svg\"><svg:rect svg:height=\"100\" svg:width=\"200\"></svg:rect></svg:svg>"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral>
 .
}

which is the query and datasource:

http://www.sparql.org/sparql?query=ASK+WHERE+%7B%0D%0A%3Chttp%3A%2F%2Fwww.example.org%3E+%3Chttp%3A%2F%2Fexample.org%2Frdf%2Fexample%3E+%22Some+text+here+in+%3Cstrong+xmlns%3D%5C%22http%3A%2F%2Fwww.w3.org%2F1999%2Fxhtml%5C%22%3Ebold%3C%2Fstrong%3E+and+an+svg+rectangle%3A+%3Csvg%3Asvg+xmlns%3Asvg%3D%5C%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%5C%22%3E%3Csvg%3Arect+svg%3Aheight%3D%5C%22100%5C%22+svg%3Awidth%3D%5C%22200%5C%22%3E%3C%2Fsvg%3Arect%3E%3C%2Fsvg%3Asvg%3E%22%5E%5E%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23XMLLiteral%3E%0D%0A+.%0D%0A%7D%0D%0A&default-graph-uri=http%3A%2F%2Fwww.w3.org%2F2007%2F08%2FpyRdfa%2Fextract%3Furi%3Dhttp%3A%2F%2Fwww.w3.org%2F2006%2F07%2FSWD%2FRDFa%2Ftestsuite%2Fxhtml1-testcases%2F0100.xhtml&stylesheet=%2Fxml-to-html.xsl


and I get "true"


SPARQL does not require canonicalization.  It is possible to get non-canonicalized XML literals into the data by using datatype XMLLiteral even in RDF/XML.

        Andy

> -----Original Message-----
> From: Manu Sporny [mailto:msporny@digitalbazaar.com]
> Sent: 9 June 2008 15:32
> To: Seaborne, Andy
> Cc: RDFa mailing list; Dave Beckett
> Subject: Re: Issue with Jena/sparql.org and XML Literals?
>
> Seaborne, Andy wrote:
> > You have to be carefule with line endings as well - \n vs \n\r etc.
> > The SPARQL parser does not canonicalize XMLLiterals in the query.
>
> There are no \n or \n\r in either the input RDF or the SPARQL, so that
> shouldn't be an issue. Thanks for mentioning that, however.
>
> > 2008Jun/0027.html ==>
> > [[
> > If you look at librdfa's output for TC100:
> >
> http://rdfa.digitalbazaar.com/librdfa/rdfa2rdf.py?uri=http://www.w3.org/20

> 06/07/SWD/RDFa/testsuite/xhtml1-testcases/0100.xhtml
> >
> > and PyRDFa's output for TC100:
> >
> http://www.w3.org/2007/08/pyRdfa/extract?uri=http://www.w3.org/2006/07/SWD

> /RDFa/testsuite/xhtml1-testcases/0100.xhtml
> > ]]
> >
> > If I understand these correctly, these are different.
>
> Yes, the XML Literals that are generated are different and the SPARQL
> tests two "valid" XML Literals. The first test in the SPARQL will match
> librdfa's output, the second should match PyRDFa's output. It is the
> second SPARQL test (the last part of the UNION) that is failing for some
> unknown reason.
>
> > Running CURL on the first I get data with multiple namespaces on each
> > element, and I don't on the second.
>
> On the second one (PyRDFa's output), you should get two namespaces, the
> standard XHTML one and the standard SVG one. This is the expected
> behavior and I believe the SPARQL is setup to test exactly that.
>
> > N-Triples files for each attached (rdfparse run on the CURL results of
> > each link..  You will see they both have XMLLiterals but are different
> sizes.
>
> Yup, that is expected. The SPARQL test has two variations that are
> valid... the second variation should be passing, but it doesn't.
>
> -- manu
>
> --
> Manu Sporny
> President/CEO - Digital Bazaar, Inc.
> blog: Dynamic Spectrum Auctions and Digital Marketplaces
> http://blog.digitalbazaar.com/2008/04/24/dynamic-spectrum-auctions/

Received on Monday, 9 June 2008 17:24:30 UTC