Re: XMLLiterals and c14n from Ivan Herman on 2009-09-08 (public-html@w3.org from September 2009)

From: Ivan Herman <ivan@w3.org>
Date: Tue, 08 Sep 2009 09:12:36 +0200
To: Philip Taylor <pjt47@cam.ac.uk>
CC: Manu Sporny <msporny@digitalbazaar.com>, HTMLWG WG <public-html@w3.org>, RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <4AA603E4.2060200@w3.org>
Hi Philip,

I think the fundamental issue is that turtle (and SPARQL) does not seem
to make a clear statement on this, the same way as RDF/XML does (that I
quoted in my mail). There is no formal spec for Turtle yet, but there is
one for SPARQL and, actually, there is a work for a SPARQL update right
now. What I will do is to raise this question in the SPARQL WG, and see
what replies I get and that can possibly lead to an update of the
formulation in the newer SPARQL standard.

In the meantime in might indeed be better, as a matter of precaution, to
get the examples in the RDFa documents tight...

Ivan

Philip Taylor wrote:
> Ivan Herman wrote:
>> Sigh. This is indeed a slightly muddy area where the RDF concept
>> document should be written differently. But, well, this is not something
>> either of these two working groups can do...
>>
>> I think the issue is that the RDF concept spec describes the abstract
>> concepts for abstract RDF graphs, and not a serialization thereof.  [...]
> 
> As I understand it, rdf-concepts explicitly describes the lexical space
> of XMLLiterals, i.e. the set of Unicode strings which values of type
> XMLLiteral must be a member of.
> 
> I'm happy to agree that serialisations like RDF/XML and RDFa specify
> their own transformations/mappings from the input document onto that
> abstract RDF lexical space, and there's no need for the input document
> to care about C14N at all - the input can be anything, and the mapping
> can be arbitrarily complicated, as long as the resultant triples contain
> values from the appropriate lexical space.
> 
> But serialisations of RDF like N3/Turtle/N-Triples represent XMLLiterals
> as typed strings. I'm making the (hopefully reasonable) assumption that
> those strings correspond directly (after appropriate charset decoding)
> to the lexical space defined by rdf-concepts - there is no non-trivial
> mapping there. (In particular, no automatic canonicalisation occurs.)
> 
> (If that assumption is wrong, and there is a non-trivial mapping between
> N3/Turtle/N-Triples serialised strings and the XMLLiteral lexical space,
> then I can't find any definition of that mapping at all, which is a
> bigger problem (unless I'm just missing it).)
> 
> The RDFa spec examples and test cases represent triples using
> Turtle/N-Triples as the serialisation format, so their strings map
> directly onto the restricted lexical space, so I believe those
> particular cases need to use canonicalised form for their serialisations
> of XMLLiteral strings.
> 
> The RDFa spec also refers to abstract triples (as the result of
> processing a document), at which point there is no serialisation
> involved at all, and so a value of type XMLLiteral must be a member of
> the lexical space of XMLLiteral, i.e. must be a canonical-form string.
> 
> So I think I agree with everything you are saying (that RDF/XML and RDFa
> don't require c14n of their input) and I think that's all good, but I
> don't think that's addressing the problems I see (which are with the
> abstract triple output of RDFa, and with specific examples of
> Turtle/N-Triples serialised triples).
> 
>> (On a practical level, all RDF environments and serializations I know
>> about behave similarly: they would take any (valid) XML as XML Literal,
>> and the C14N comes into the picture when two XML literals are checked,
>> eg, for equality.)
> 
> (If equality is always checked in terms of C14N-equivalence, why does
> http://www.w3.org/2006/07/SWD/RDFa/testsuite/xhtml1-testcases/0011.sparql
> say that the output must equal either one of two strings that are
> C14N-equivalent? If it's equal to one, it would also be equal to the
> other. So I presume at least some implementations just do simple string
> equality, instead of dealing with C14N when checking equality, and the
> C14N should be dealt with at an earlier point (when generating the
> triples) to avoid making equality comparisons hopelessly inefficient.)
> 
>> Ivan
> 

-- 

Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Received on Tuesday, 8 September 2009 07:13:48 UTC