W3C home > Mailing lists > Public > public-rdf-in-xhtml-tf@w3.org > June 2008

Re: Issue with Jena/sparql.org and XML Literals?

From: Ivan Herman <ivan@w3.org>
Date: Tue, 10 Jun 2008 09:39:21 +0200
Message-ID: <484E2FA9.9060808@w3.org>
To: "Seaborne, Andy" <andy.seaborne@hp.com>, Dave Beckett <dave@dajobe.org>
CC: Manu Sporny <msporny@digitalbazaar.com>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>


Seaborne, Andy wrote:
> The input from PyRDFa is RDF/XML with a parseType literal. That gets XML Exclusive Canonicalization applied (required of RDF/XML parsers - parseType literal does not exist in N3/Turtle/N-Triples, only
> datatype XMLLiteral and parsers for those serializations do not canonicalize).
> 
> http://www.w3.org/TR/rdf-syntax-grammar/#section-grammar-productions
> 7.2.17 - bullet point 2.
> 
> http://www.w3.org/TR/rdf-syntax-grammar/#section-Syntax-XML-literals
> 

Yes. But, I must admit, it surprises me that if I use explicit 
XMLLiteral datatype, the same would not apply. _I see_ in the RDF/XML 
parsing rules that this route is not mentioned, ie, you are right in 
reading the spec, but I wonder whether this is not a bug. 
parseType="Literal" ought to be merely an abbreviation for the explicit 
datatype setting...

Dave, as editor of this document, what do you think? If there is a bug 
here, it would be worth recording it formally, so that it could be 
reopened if ever we touch RDF/XML again...

However: this should _not_ be the job of the RDFa group and should not 
influence RDFa...

Ivan


> This means two changes are made to what is sent over the wire:
> 
> 1/ Unused namespaces are removed (e.g on <strong ..>, there was a xmlns:svg)
> 2/ <svg:rect/> is replaced by <svg:rect></svg:rect>
> 
> You can see what ends up in the store by asking:
> 
> SELECT ?o { ?s ?p ?o }
> 
> which is on that PyRDFa's output:
> 
> http://www.sparql.org/sparql?query=SELECT+%3Fo+%7B+%3Fs+%3Fp+%3Fo%7D&default-graph-uri=http%3A%2F%2Fwww.w3.org%2F2007%2F08%2FpyRdfa%2Fextract%3Furi%3Dhttp%3A%2F%2Fwww.w3.org%2F2006%2F07%2FSWD%2FRDFa%2Ftestsuite%2Fxhtml1-testcases%2F0100.xhtml&stylesheet=%2Fxml-to-html.xsl
> 
> 
> I applied the same canonicalization to the query (canonicalization of the object in the query):
> 
> ASK WHERE {
> <http://www.example.org> <http://example.org/rdf/example> "Some text here in <strong xmlns=\"http://www.w3.org/1999/xhtml\">bold</strong> and an svg rectangle: <svg:svg xmlns:svg=\"http://www.w3.org/2000/svg\"><svg:rect svg:height=\"100\" svg:width=\"200\"></svg:rect></svg:svg>"^^<http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral>
>  .
> }
> 
> which is the query and datasource:
> 
> http://www.sparql.org/sparql?query=ASK+WHERE+%7B%0D%0A%3Chttp%3A%2F%2Fwww.example.org%3E+%3Chttp%3A%2F%2Fexample.org%2Frdf%2Fexample%3E+%22Some+text+here+in+%3Cstrong+xmlns%3D%5C%22http%3A%2F%2Fwww.w3.org%2F1999%2Fxhtml%5C%22%3Ebold%3C%2Fstrong%3E+and+an+svg+rectangle%3A+%3Csvg%3Asvg+xmlns%3Asvg%3D%5C%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%5C%22%3E%3Csvg%3Arect+svg%3Aheight%3D%5C%22100%5C%22+svg%3Awidth%3D%5C%22200%5C%22%3E%3C%2Fsvg%3Arect%3E%3C%2Fsvg%3Asvg%3E%22%5E%5E%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23XMLLiteral%3E%0D%0A+.%0D%0A%7D%0D%0A&default-graph-uri=http%3A%2F%2Fwww.w3.org%2F2007%2F08%2FpyRdfa%2Fextract%3Furi%3Dhttp%3A%2F%2Fwww.w3.org%2F2006%2F07%2FSWD%2FRDFa%2Ftestsuite%2Fxhtml1-testcases%2F0100.xhtml&stylesheet=%2Fxml-to-html.xsl
> 
> and I get "true"
> 
> 
> SPARQL does not require canonicalization.  It is possible to get non-canonicalized XML literals into the data by using datatype XMLLiteral even in RDF/XML.
> 
>         Andy
> 
>> -----Original Message-----
>> From: Manu Sporny [mailto:msporny@digitalbazaar.com]
>> Sent: 9 June 2008 15:32
>> To: Seaborne, Andy
>> Cc: RDFa mailing list; Dave Beckett
>> Subject: Re: Issue with Jena/sparql.org and XML Literals?
>>
>> Seaborne, Andy wrote:
>>> You have to be carefule with line endings as well - \n vs \n\r etc.
>>> The SPARQL parser does not canonicalize XMLLiterals in the query.
>> There are no \n or \n\r in either the input RDF or the SPARQL, so that
>> shouldn't be an issue. Thanks for mentioning that, however.
>>
>>> 2008Jun/0027.html ==>
>>> [[
>>> If you look at librdfa's output for TC100:
>>>
>> http://rdfa.digitalbazaar.com/librdfa/rdfa2rdf.py?uri=http://www.w3.org/20
>> 06/07/SWD/RDFa/testsuite/xhtml1-testcases/0100.xhtml
>>> and PyRDFa's output for TC100:
>>>
>> http://www.w3.org/2007/08/pyRdfa/extract?uri=http://www.w3.org/2006/07/SWD
>> /RDFa/testsuite/xhtml1-testcases/0100.xhtml
>>> ]]
>>>
>>> If I understand these correctly, these are different.
>> Yes, the XML Literals that are generated are different and the SPARQL
>> tests two "valid" XML Literals. The first test in the SPARQL will match
>> librdfa's output, the second should match PyRDFa's output. It is the
>> second SPARQL test (the last part of the UNION) that is failing for some
>> unknown reason.
>>
>>> Running CURL on the first I get data with multiple namespaces on each
>>> element, and I don't on the second.
>> On the second one (PyRDFa's output), you should get two namespaces, the
>> standard XHTML one and the standard SVG one. This is the expected
>> behavior and I believe the SPARQL is setup to test exactly that.
>>
>>> N-Triples files for each attached (rdfparse run on the CURL results of
>>> each link..  You will see they both have XMLLiterals but are different
>> sizes.
>>
>> Yup, that is expected. The SPARQL test has two variations that are
>> valid... the second variation should be passing, but it doesn't.
>>
>> -- manu
>>
>> --
>> Manu Sporny
>> President/CEO - Digital Bazaar, Inc.
>> blog: Dynamic Spectrum Auctions and Digital Marketplaces
>> http://blog.digitalbazaar.com/2008/04/24/dynamic-spectrum-auctions/

-- 

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf


Received on Tuesday, 10 June 2008 07:39:49 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 10 June 2008 07:39:50 GMT