Re: XMLLiterals and c14n from Philip Taylor on 2009-09-07 (public-rdf-in-xhtml-tf@w3.org from September 2009)

From: Philip Taylor <pjt47@cam.ac.uk>
Date: Mon, 07 Sep 2009 18:28:56 +0100
To: Ivan Herman <ivan@w3.org>
CC: Manu Sporny <msporny@digitalbazaar.com>, HTMLWG WG <public-html@w3.org>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <4AA542D8.9020705@cam.ac.uk>

Ivan Herman wrote:
> Sigh. This is indeed a slightly muddy area where the RDF concept
> document should be written differently. But, well, this is not something
> either of these two working groups can do...
> 
> I think the issue is that the RDF concept spec describes the abstract
> concepts for abstract RDF graphs, and not a serialization thereof.  [...]

As I understand it, rdf-concepts explicitly describes the lexical space 
of XMLLiterals, i.e. the set of Unicode strings which values of type 
XMLLiteral must be a member of.

I'm happy to agree that serialisations like RDF/XML and RDFa specify 
their own transformations/mappings from the input document onto that 
abstract RDF lexical space, and there's no need for the input document 
to care about C14N at all - the input can be anything, and the mapping 
can be arbitrarily complicated, as long as the resultant triples contain 
values from the appropriate lexical space.

But serialisations of RDF like N3/Turtle/N-Triples represent XMLLiterals 
as typed strings. I'm making the (hopefully reasonable) assumption that 
those strings correspond directly (after appropriate charset decoding) 
to the lexical space defined by rdf-concepts - there is no non-trivial 
mapping there. (In particular, no automatic canonicalisation occurs.)

(If that assumption is wrong, and there is a non-trivial mapping between 
N3/Turtle/N-Triples serialised strings and the XMLLiteral lexical space, 
then I can't find any definition of that mapping at all, which is a 
bigger problem (unless I'm just missing it).)

The RDFa spec examples and test cases represent triples using 
Turtle/N-Triples as the serialisation format, so their strings map 
directly onto the restricted lexical space, so I believe those 
particular cases need to use canonicalised form for their serialisations 
of XMLLiteral strings.

The RDFa spec also refers to abstract triples (as the result of 
processing a document), at which point there is no serialisation 
involved at all, and so a value of type XMLLiteral must be a member of 
the lexical space of XMLLiteral, i.e. must be a canonical-form string.

So I think I agree with everything you are saying (that RDF/XML and RDFa 
don't require c14n of their input) and I think that's all good, but I 
don't think that's addressing the problems I see (which are with the 
abstract triple output of RDFa, and with specific examples of 
Turtle/N-Triples serialised triples).

> (On a practical level, all RDF environments and serializations I know
> about behave similarly: they would take any (valid) XML as XML Literal,
> and the C14N comes into the picture when two XML literals are checked,
> eg, for equality.)

(If equality is always checked in terms of C14N-equivalence, why does 
http://www.w3.org/2006/07/SWD/RDFa/testsuite/xhtml1-testcases/0011.sparql 
say that the output must equal either one of two strings that are 
C14N-equivalent? If it's equal to one, it would also be equal to the 
other. So I presume at least some implementations just do simple string 
equality, instead of dealing with C14N when checking equality, and the 
C14N should be dealt with at an earlier point (when generating the 
triples) to avoid making equality comparisons hopelessly inefficient.)

> Ivan

-- 
Philip Taylor
pjt47@cam.ac.uk

Received on Monday, 7 September 2009 17:29:34 UTC