Re: Test case regarding XML Literals and octets from Martin Duerst on 2003-08-03 (www-rdf-comments@w3.org from July to September 2003)

From: Martin Duerst <duerst@w3.org>
Date: Sun, 03 Aug 2003 14:12:30 -0400
To: Graham Klyne <GK-lists@ninebynine.org>, pat hayes <phayes@ihmc.us>
Cc: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>, www-rdf-comments@w3.org, w3c-i18n-ig@w3.org, msm@w3.org, w3c-rdfcore-wg@w3.org, reagle@w3.org
Message-Id: <4.2.0.58.J.20030801144833.0631f4e0@localhost>

[if the issues addressed in this mail have been changed to the
better in the meantime, please ignore this mail]

At 23:36 03/07/31 +0100, Graham Klyne wrote:

>At 17:57 31/07/03 -0400, Martin Duerst wrote:

>>The first is markup. The second is a sequence of binary octets.
>>And the two are not equivalent according to RDF. Because the
>>canonicalization of <br/> is <br></br>, the octet sequence for
>><br/> in hexBinary is 3C62723E3C2F62723E.
>>
>><br/>, <br></br>, and 3C62723E3C2F62723E (with the appropriate
>>syntactic decorations) entail each other. The don't entail
>>3C62722F3E.
>
>Oh... phui... I should have spotted that, the trouble is it's a complete 
>distraction from the point I was trying to make.

And you distracted me quite a bit, too. For a few minutes, I was
fearing that canonical XML would suddenly introduce denotational
equality between 3C62723E3C2F62723E and 3C62722F3E, because both
of their XML equivalents would be denotationally equivalent !

>So I should have said:
>Specifically, if I have the values denoted by:
>
>    <eg:bar rdf:parseType="Literal"><br></br></eg:bar>
>
>and
>
>    <eg:bar rdf:datatype="http://www.w3.org/2001/XMLSchema#hexBinary"
>        >3C62723E3C2F62723E</eg:bar>
>
>what is it that tells me the first is to be treated as markup, but not the 
>second?
>
>(A point that my mistake illustrates is now difficult it is to get this 
>all right... it's very late in the day to be asked to consider alternative 
>designs.)

I guess your mistake may also illustrate that it would be better
to keep two things separate if we are not sure about their
relationship. This seems to be much safer than introducing
unwanted consequences with some unclear equalities.

>You responded to my original question:
>"The first is markup. The second is a sequence of binary octets."
>but how can I know that, based solely on their octet sequence 
>denotations?  You seem to be claiming that we can somehow magically tell 
>the difference.

Machines cannot deduce that based on their current denotations.
But that's exactly why I think that the denotation for XML Literals
should be changed.

>In the case if strings and XML, I've been trying to point out that our 
>present design (in which plain literals are self-denoting strings) doesn't 
>provide enough information to distinguish between characters used as 
>characters and and characters used for markup (however 
>represented).  Nobody has yet suggested how the this might be achieved.

The RDF spec says that
     <eg:prop rdf:dataType="&xsd;integer">10</eg:prop>
denotes an integer, but that
     <eg:prop>10</eg:prop>
denotes the simple string "10". The RDF spec tells you that the
first should be treated as an integer. You are free to treat
the second one as an integer if that makes sense to you, but
RDF doesn't help you. You are free to treat the second one
as anything else that makes sense to you and is based on
the fact that it is a string that may represent something
else.

The fact that we have representational layering (octets that
represent characters that represent integers and other stuff),
and that we have datatypes for things in different layers of
this hierarchy, may be confusing. But it is just an artefact
of the fact that the perfect and final semantics have not been
invented (or decided upon) yet, and that therefore in some cases,
we have to stop at an intermediate representation.

The fact that we sometimes have to stop at an intermediate
representation (e.g. hexBinary for a public key, or string
for some element or attribute content such as CSS style rules
or SVG paths) shows that we have some more work ahead of us,
but it shouldn't be taken as an excuse to map things we already
have moved to a somewhat higher and more structured form
(such as XML) to something lower. If we wanted, we could
take all the XML Schema Datatypes and for all of them define
how they denote octets, but that would clearly be big step
backwards rather than moving forwards.

Regards,   Martin.

Received on Sunday, 3 August 2003 16:11:57 UTC