Re: Test case regarding XML Literals and octets from Graham Klyne on 2003-07-31 (www-rdf-comments@w3.org from July to September 2003)

From: Graham Klyne <GK-lists@ninebynine.org>
Date: Thu, 31 Jul 2003 23:36:33 +0100
To: Martin Duerst <duerst@w3.org>, pat hayes <phayes@ihmc.us>
Cc: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>, www-rdf-comments@w3.org, w3c-i18n-ig@w3.org, msm@w3.org, w3c-rdf-core-wg@w3.org, reagle@w3.org
Message-Id: <5.1.0.14.2.20030731233431.00b7b158@127.0.0.1>

At 17:57 31/07/03 -0400, Martin Duerst wrote:
>>Specifically, if I have the values denoted by:
>>
>>    <eg:bar rdf:parseType="Literal"><br/></eg:bar>
>>
>>and
>>
>>    <eg:bar rdf:datatype="http://www.w3.org/2001/XMLSchema#hexBinary"
>>        >3C62722F3E</eg:bar>
>>
>>what is it that tells me the first is to be treated as markup, but not 
>>the second?
>
>The first is markup. The second is a sequence of binary octets.
>And the two are not equivalent according to RDF. Because the
>canonicalization of <br/> is <br></br>, the octet sequence for
><br/> in hexBinary is 3C62723E3C2F62723E.
>
><br/>, <br></br>, and 3C62723E3C2F62723E (with the appropriate
>syntactic decorations) entail each other. The don't entail
>3C62722F3E.

Oh... phui... I should have spotted that, the trouble is it's a complete 
distraction from the point I was trying to make.

So I should have said:
Specifically, if I have the values denoted by:

    <eg:bar rdf:parseType="Literal"><br></br></eg:bar>

and

    <eg:bar rdf:datatype="http://www.w3.org/2001/XMLSchema#hexBinary"
        >3C62723E3C2F62723E</eg:bar>

what is it that tells me the first is to be treated as markup, but not the 
second?

(A point that my mistake illustrates is now difficult it is to get this all 
right... it's very late in the day to be asked to consider alternative 
designs.)

You responded to my original question:
"The first is markup. The second is a sequence of binary octets."
but how can I know that, based solely on their octet sequence 
denotations?  You seem to be claiming that we can somehow magically tell 
the difference.

In the case if strings and XML, I've been trying to point out that our 
present design (in which plain literals are self-denoting strings) doesn't 
provide enough information to distinguish between characters used as 
characters and and characters used for markup (however 
represented).  Nobody has yet suggested how the this might be achieved.

(I shall be unable to continue regular participation throigh August, but I 
don't really think I've really any more to add.  Maybe someone else will be 
able to see a way to overcome the "insurmountable" problem I see here.)

#g

---------------------------------
Graham Klyne  <GK@NineByNine.net>
Nine by Nine
http://www.ninebynine.net/

Received on Thursday, 31 July 2003 19:03:52 UTC