Re: Test case regarding XML Literals and octets from Martin Duerst on 2003-07-31 (www-rdf-comments@w3.org from July to September 2003)

From: Martin Duerst <duerst@w3.org>
Date: Thu, 31 Jul 2003 17:57:15 -0400
To: Graham Klyne <GK-lists@ninebynine.org>, pat hayes <phayes@ihmc.us>
Cc: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>, www-rdf-comments@w3.org, w3c-i18n-ig@w3.org, msm@w3.org, w3c-rdf-core-wg@w3.org, reagle@w3.org
Message-Id: <4.2.0.58.J.20030731170445.069a2960@localhost>

At 21:42 03/07/31 +0100, Graham Klyne wrote:
>Martin,
>
>as far as I can tell, you're contradicting the XML canonicalization spec.

No. I'm just saying that it is a bad idea to use XML canonicalization,
which was developed for purposes such as parser testing, digital
signatures, and encryption, to come up with a proposal for what
an XML literal denotes.

>Is canonical XML a sequence of octets or something else?

Canonical XML is a sequence of octets. (Exclusive) canonical
XML is a good tool to answer questions about the equivalence
of XML fragments. Canonical XML, in any kind of definition
currently available 'off-the-shelf', is not a good tool to
express what XML Literals denote.

>The XML canonicalization spec, I understand, says it's a sequence of octets.
>
>Maybe, you want to say it's a sequence of octets that is to be interpreted 
>in specific way, in which case it's not *just* a sequence of octets, but 
>must also carry some distinguishing datum that indicates that this special 
>processing is required.

It's not necessarily a requirement. But it's the most usual
and appropriate thing to do with an XML Literal. On the other
hand, it's a totally arbitrary thing to do with an octet sequence.
So yes, the expectations for processing are different.

>Specifically, if I have the values denoted by:
>
>    <eg:bar rdf:parseType="Literal"><br/></eg:bar>
>
>and
>
>    <eg:bar rdf:datatype="http://www.w3.org/2001/XMLSchema#hexBinary"
>        >3C62722F3E</eg:bar>
>
>what is it that tells me the first is to be treated as markup, but not the 
>second?

The first is markup. The second is a sequence of binary octets.
And the two are not equivalent according to RDF. Because the
canonicalization of <br/> is <br></br>, the octet sequence for
<br/> in hexBinary is 3C62723E3C2F62723E.

<br/>, <br></br>, and 3C62723E3C2F62723E (with the appropriate
syntactic decorations) entail each other. The don't entail
3C62722F3E.

There may be some odd cases where 3C62722F3E will be interpreted as
XML. The RDF spec would not support that, but it would not prohibit
that. However, the RDF spec (if we agree on your interpretation and
make my test case positive) says that 3C62723E3C2F62723E is the same
as the XML Literal(s) <br></br> or <br/>. This strikes me as very
odd, to say the least.

Regards,    Martin.

Received on Thursday, 31 July 2003 18:01:45 UTC