W3C home > Mailing lists > Public > www-rdf-comments@w3.org > July to September 2003

Re: Test case regarding XML Literals and octets

From: Benja Fallenstein <b.fallenstein@gmx.de>
Date: Thu, 31 Jul 2003 23:20:27 +0200
Message-ID: <3F29881B.8060108@gmx.de>
To: Graham Klyne <GK-lists@ninebynine.org>
CC: Martin Duerst <duerst@w3.org>, pat hayes <phayes@ihmc.us>, "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>, www-rdf-comments@w3.org, w3c-i18n-ig@w3.org, msm@w3.org, w3c-rdf-core-wg@w3.org


Hi Graham,

Graham Klyne wrote:
> as far as I can tell, you're contradicting the XML canonicalization spec.
> 
> Is canonical XML a sequence of octets or something else?
> 
> The XML canonicalization spec, I understand, says it's a sequence of 
> octets.

I can see what you're saying. The XML c14n spec says that

     The term exclusive canonical XML refers to XML that is in
     exclusive canonical form.

<http://www.w3.org/TR/2002/REC-xml-exc-c14n-20020718/#def-exclusive-canonical-XML>

which is refered to by

     The lexical-to-value mapping [of XMLLiterals] maps a string to the
     corresponding exclusive Canonical XML (with comments, with empty
     InclusiveNamespaces PrefixList ).

<http://www.w3.org/2001/sw/RDFCore/TR/WD-rdf-concepts-20030117/#section-XMLLiteral>

I think "XML in exclusive canonical form" can indeed only be taken as 
octets; an abstract XML infoset certainly cannot be in canonical form.

I believe that it is a bad idea to treat XML literals like this, though. 
Exclusive Canonical XML is a *serialization* of an abstract concept, and 
IMO the specs say this very clearly:

     It is normal for XML documents and subdocuments which are equivalent
     for the purposes of many applications to differ in their physical
     representation. For example, they may differ in their entity
     structure, attribute ordering, and character encoding. The goal of
     this specification is to establish a method for serializing the
     XPath node-set representation of an XML document or subset [...].

     -- http://www.w3.org/TR/xml-exc-c14n/#sec-Intro

So exclusive canonical XML is a *serialization* for *a representation* 
of *an XML document*. I think it makes little sense to specify a way to 
denotate serializations-- that's like specifying that

     "254"^^foo:integer

is a literal denoting the string of Unicode characters "FE", which is 
the hexadecimal serialization of the integer 10; and that therefore, the 
literal has the same denotation as

     "FE"^^xsd:string

You *can* do it, but it doesn't make a lot of sense. (And it certainly 
is surprising given that the data type is called 'foo:integer.')

I agree with Martin that it makes sense for the spec to say that XML 
documents are an abstract set with equivalence defined by exclusive 
c14n. If you don't like the abstract set approach, you could also say 
that XML literals they denote XPath node-sets, that would be in keeping 
with the c14n spec.

(I also agree that it would be good if XML literal without any markup 
would be equivalent to the corresponding plain literals/XSD strings, but 
that's off-the-point.)

I understand why it makes sense for the *lexical space* of XML literals, 
to be Exclusive Canonical XML, but I don't understand for the *value space*.

Cheers,
- Benja
Received on Thursday, 31 July 2003 17:22:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 21 September 2012 14:16:32 GMT