Re: XMLLiterals and c14n from Ivan Herman on 2009-09-16 (public-rdf-in-xhtml-tf@w3.org from September 2009)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 16 Sep 2009 11:14:41 +0200
To: Philip Taylor <pjt47@cam.ac.uk>
CC: Manu Sporny <msporny@digitalbazaar.com>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <4AB0AC81.8010107@w3.org>
[I removed the HTML WG from this thread. Simply because, I believe, this
is not of immediate concern to them at this point...]

Philip,

sorry for the long delay but, as I promised I would do, I took up this
discussion with the SPARQL folks, too, just to check.

The bottomline is that you are right. Neither N3/Turtle nor SPARQL
includes any automatic canonicalization of XML Literals (in contrast to
RDF/XML), nor will the new version of SPARQL do it. The only difference
may be in future that in the new version of SPARQL there might be
description of inference regimes, in particular ones that do take into
account datatype entailements, and that version _might_ go further than
the current SPARQL. But that is for the future with lots of maybes.

But, as I think we said in the previous mails, this in fact does not
affect the RDFa spec proper. We only say that proper RDF triples should
be produced by an RDFa processor, ie, a proper RDF XMLLiteral should be
generated. When and how canonicalization occurs is not something the
RDFa spec has to describe; if, for example, the RDFa processor simply
generated RDF/XML, then the issue is irrelevant for the processor itself.

I also looked at the RDFa spec  to see if the examples in the text are
o.k., but luckily I did not see any issues there. I may have missed one,
though...

Where there _is_ an issue to discuss is, as you point it out in your
first mail, in the test suite. Indeed, the current test cases for XML
Literals, described as SPARQL queries, are usually defined in the form
of UNION-s, ie, they follow a pattern like

[[[
ASK WHERE {
<> dc:creator "Albert Einstein" .
{
<> dc:title 'E = mc<sup xmlns=\"http://www.w3.org/1999/xhtml\"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:bla="http://www.w3.org/1999/02/22-rdf-syntax-ns#">2</sup>: The
Most Urgent Problem of Our Time'^^xsd:XMLLiteral
}
UNION
{
<> dc:title 'E = mc<sup xmlns=\"http://www.w3.org/1999/xhtml\"
xmlns:dc="http://purl.org/dc/elements/1.1/">2</sup>: The Most Urgent
Problem of Our Time'^^xsd:XMLLiteral
}
]]]

and, strictly speaking, this is indeed not kosher because the first
branch of the UNION is _not_ the proper version of the literal because,
as we said, SPARQL requires the XML Literal to be in canonicalized
format already.

I am not absolutely sure what to do, however. The current test suite is
pragmatic, insofar as many implementations (ie, the underlying XML
package) would indeed produce the first version of the XML Literal.
Maybe these tests should be flagged somehow to make it clear that there
is an issue there and that really really conforming processors should
produce the second version only...

Cheers

Ivan

Philip Taylor wrote:
> Ivan Herman wrote:
>> Sigh. This is indeed a slightly muddy area where the RDF concept
>> document should be written differently. But, well, this is not something
>> either of these two working groups can do...
>>
>> I think the issue is that the RDF concept spec describes the abstract
>> concepts for abstract RDF graphs, and not a serialization thereof.  [...]
> 
> As I understand it, rdf-concepts explicitly describes the lexical space
> of XMLLiterals, i.e. the set of Unicode strings which values of type
> XMLLiteral must be a member of.
> 
> I'm happy to agree that serialisations like RDF/XML and RDFa specify
> their own transformations/mappings from the input document onto that
> abstract RDF lexical space, and there's no need for the input document
> to care about C14N at all - the input can be anything, and the mapping
> can be arbitrarily complicated, as long as the resultant triples contain
> values from the appropriate lexical space.
> 
> But serialisations of RDF like N3/Turtle/N-Triples represent XMLLiterals
> as typed strings. I'm making the (hopefully reasonable) assumption that
> those strings correspond directly (after appropriate charset decoding)
> to the lexical space defined by rdf-concepts - there is no non-trivial
> mapping there. (In particular, no automatic canonicalisation occurs.)
> 
> (If that assumption is wrong, and there is a non-trivial mapping between
> N3/Turtle/N-Triples serialised strings and the XMLLiteral lexical space,
> then I can't find any definition of that mapping at all, which is a
> bigger problem (unless I'm just missing it).)
> 
> The RDFa spec examples and test cases represent triples using
> Turtle/N-Triples as the serialisation format, so their strings map
> directly onto the restricted lexical space, so I believe those
> particular cases need to use canonicalised form for their serialisations
> of XMLLiteral strings.
> 
> The RDFa spec also refers to abstract triples (as the result of
> processing a document), at which point there is no serialisation
> involved at all, and so a value of type XMLLiteral must be a member of
> the lexical space of XMLLiteral, i.e. must be a canonical-form string.
> 
> So I think I agree with everything you are saying (that RDF/XML and RDFa
> don't require c14n of their input) and I think that's all good, but I
> don't think that's addressing the problems I see (which are with the
> abstract triple output of RDFa, and with specific examples of
> Turtle/N-Triples serialised triples).
> 
>> (On a practical level, all RDF environments and serializations I know
>> about behave similarly: they would take any (valid) XML as XML Literal,
>> and the C14N comes into the picture when two XML literals are checked,
>> eg, for equality.)
> 
> (If equality is always checked in terms of C14N-equivalence, why does
> http://www.w3.org/2006/07/SWD/RDFa/testsuite/xhtml1-testcases/0011.sparql
> say that the output must equal either one of two strings that are
> C14N-equivalent? If it's equal to one, it would also be equal to the
> other. So I presume at least some implementations just do simple string
> equality, instead of dealing with C14N when checking equality, and the
> C14N should be dealt with at an earlier point (when generating the
> triples) to avoid making equality comparisons hopelessly inefficient.)
> 
>> Ivan
> 

-- 

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Wednesday, 16 September 2009 09:15:21 UTC