- From: Ivan Herman <ivan@w3.org>
- Date: Wed, 16 Sep 2009 11:14:41 +0200
- To: Philip Taylor <pjt47@cam.ac.uk>
- CC: Manu Sporny <msporny@digitalbazaar.com>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
- Message-ID: <4AB0AC81.8010107@w3.org>
[I removed the HTML WG from this thread. Simply because, I believe, this is not of immediate concern to them at this point...] Philip, sorry for the long delay but, as I promised I would do, I took up this discussion with the SPARQL folks, too, just to check. The bottomline is that you are right. Neither N3/Turtle nor SPARQL includes any automatic canonicalization of XML Literals (in contrast to RDF/XML), nor will the new version of SPARQL do it. The only difference may be in future that in the new version of SPARQL there might be description of inference regimes, in particular ones that do take into account datatype entailements, and that version _might_ go further than the current SPARQL. But that is for the future with lots of maybes. But, as I think we said in the previous mails, this in fact does not affect the RDFa spec proper. We only say that proper RDF triples should be produced by an RDFa processor, ie, a proper RDF XMLLiteral should be generated. When and how canonicalization occurs is not something the RDFa spec has to describe; if, for example, the RDFa processor simply generated RDF/XML, then the issue is irrelevant for the processor itself. I also looked at the RDFa spec to see if the examples in the text are o.k., but luckily I did not see any issues there. I may have missed one, though... Where there _is_ an issue to discuss is, as you point it out in your first mail, in the test suite. Indeed, the current test cases for XML Literals, described as SPARQL queries, are usually defined in the form of UNION-s, ie, they follow a pattern like [[[ ASK WHERE { <> dc:creator "Albert Einstein" . { <> dc:title 'E = mc<sup xmlns=\"http://www.w3.org/1999/xhtml\" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:bla="http://www.w3.org/1999/02/22-rdf-syntax-ns#">2</sup>: The Most Urgent Problem of Our Time'^^xsd:XMLLiteral } UNION { <> dc:title 'E = mc<sup xmlns=\"http://www.w3.org/1999/xhtml\" xmlns:dc="http://purl.org/dc/elements/1.1/">2</sup>: The Most Urgent Problem of Our Time'^^xsd:XMLLiteral } ]]] and, strictly speaking, this is indeed not kosher because the first branch of the UNION is _not_ the proper version of the literal because, as we said, SPARQL requires the XML Literal to be in canonicalized format already. I am not absolutely sure what to do, however. The current test suite is pragmatic, insofar as many implementations (ie, the underlying XML package) would indeed produce the first version of the XML Literal. Maybe these tests should be flagged somehow to make it clear that there is an issue there and that really really conforming processors should produce the second version only... Cheers Ivan Philip Taylor wrote: > Ivan Herman wrote: >> Sigh. This is indeed a slightly muddy area where the RDF concept >> document should be written differently. But, well, this is not something >> either of these two working groups can do... >> >> I think the issue is that the RDF concept spec describes the abstract >> concepts for abstract RDF graphs, and not a serialization thereof. [...] > > As I understand it, rdf-concepts explicitly describes the lexical space > of XMLLiterals, i.e. the set of Unicode strings which values of type > XMLLiteral must be a member of. > > I'm happy to agree that serialisations like RDF/XML and RDFa specify > their own transformations/mappings from the input document onto that > abstract RDF lexical space, and there's no need for the input document > to care about C14N at all - the input can be anything, and the mapping > can be arbitrarily complicated, as long as the resultant triples contain > values from the appropriate lexical space. > > But serialisations of RDF like N3/Turtle/N-Triples represent XMLLiterals > as typed strings. I'm making the (hopefully reasonable) assumption that > those strings correspond directly (after appropriate charset decoding) > to the lexical space defined by rdf-concepts - there is no non-trivial > mapping there. (In particular, no automatic canonicalisation occurs.) > > (If that assumption is wrong, and there is a non-trivial mapping between > N3/Turtle/N-Triples serialised strings and the XMLLiteral lexical space, > then I can't find any definition of that mapping at all, which is a > bigger problem (unless I'm just missing it).) > > The RDFa spec examples and test cases represent triples using > Turtle/N-Triples as the serialisation format, so their strings map > directly onto the restricted lexical space, so I believe those > particular cases need to use canonicalised form for their serialisations > of XMLLiteral strings. > > The RDFa spec also refers to abstract triples (as the result of > processing a document), at which point there is no serialisation > involved at all, and so a value of type XMLLiteral must be a member of > the lexical space of XMLLiteral, i.e. must be a canonical-form string. > > So I think I agree with everything you are saying (that RDF/XML and RDFa > don't require c14n of their input) and I think that's all good, but I > don't think that's addressing the problems I see (which are with the > abstract triple output of RDFa, and with specific examples of > Turtle/N-Triples serialised triples). > >> (On a practical level, all RDF environments and serializations I know >> about behave similarly: they would take any (valid) XML as XML Literal, >> and the C14N comes into the picture when two XML literals are checked, >> eg, for equality.) > > (If equality is always checked in terms of C14N-equivalence, why does > http://www.w3.org/2006/07/SWD/RDFa/testsuite/xhtml1-testcases/0011.sparql > say that the output must equal either one of two strings that are > C14N-equivalent? If it's equal to one, it would also be equal to the > other. So I presume at least some implementations just do simple string > equality, instead of dealing with C14N when checking equality, and the > C14N should be dealt with at an earlier point (when generating the > triples) to avoid making equality comparisons hopelessly inefficient.) > >> Ivan > -- Ivan Herman, W3C Semantic Web Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 PGP Key: http://www.ivan-herman.net/pgpkey.html FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Wednesday, 16 September 2009 09:15:21 UTC