- From: Martin Duerst <duerst@w3.org>
- Date: Sun, 03 Aug 2003 14:12:30 -0400
- To: Graham Klyne <GK-lists@ninebynine.org>, pat hayes <phayes@ihmc.us>
- Cc: "Peter F. Patel-Schneider" <pfps@research.bell-labs.com>, www-rdf-comments@w3.org, w3c-i18n-ig@w3.org, msm@w3.org, w3c-rdfcore-wg@w3.org, reagle@w3.org
[if the issues addressed in this mail have been changed to the better in the meantime, please ignore this mail] At 23:36 03/07/31 +0100, Graham Klyne wrote: >At 17:57 31/07/03 -0400, Martin Duerst wrote: >>The first is markup. The second is a sequence of binary octets. >>And the two are not equivalent according to RDF. Because the >>canonicalization of <br/> is <br></br>, the octet sequence for >><br/> in hexBinary is 3C62723E3C2F62723E. >> >><br/>, <br></br>, and 3C62723E3C2F62723E (with the appropriate >>syntactic decorations) entail each other. The don't entail >>3C62722F3E. > >Oh... phui... I should have spotted that, the trouble is it's a complete >distraction from the point I was trying to make. And you distracted me quite a bit, too. For a few minutes, I was fearing that canonical XML would suddenly introduce denotational equality between 3C62723E3C2F62723E and 3C62722F3E, because both of their XML equivalents would be denotationally equivalent ! >So I should have said: >Specifically, if I have the values denoted by: > > <eg:bar rdf:parseType="Literal"><br></br></eg:bar> > >and > > <eg:bar rdf:datatype="http://www.w3.org/2001/XMLSchema#hexBinary" > >3C62723E3C2F62723E</eg:bar> > >what is it that tells me the first is to be treated as markup, but not the >second? > >(A point that my mistake illustrates is now difficult it is to get this >all right... it's very late in the day to be asked to consider alternative >designs.) I guess your mistake may also illustrate that it would be better to keep two things separate if we are not sure about their relationship. This seems to be much safer than introducing unwanted consequences with some unclear equalities. >You responded to my original question: >"The first is markup. The second is a sequence of binary octets." >but how can I know that, based solely on their octet sequence >denotations? You seem to be claiming that we can somehow magically tell >the difference. Machines cannot deduce that based on their current denotations. But that's exactly why I think that the denotation for XML Literals should be changed. >In the case if strings and XML, I've been trying to point out that our >present design (in which plain literals are self-denoting strings) doesn't >provide enough information to distinguish between characters used as >characters and and characters used for markup (however >represented). Nobody has yet suggested how the this might be achieved. The RDF spec says that <eg:prop rdf:dataType="&xsd;integer">10</eg:prop> denotes an integer, but that <eg:prop>10</eg:prop> denotes the simple string "10". The RDF spec tells you that the first should be treated as an integer. You are free to treat the second one as an integer if that makes sense to you, but RDF doesn't help you. You are free to treat the second one as anything else that makes sense to you and is based on the fact that it is a string that may represent something else. The fact that we have representational layering (octets that represent characters that represent integers and other stuff), and that we have datatypes for things in different layers of this hierarchy, may be confusing. But it is just an artefact of the fact that the perfect and final semantics have not been invented (or decided upon) yet, and that therefore in some cases, we have to stop at an intermediate representation. The fact that we sometimes have to stop at an intermediate representation (e.g. hexBinary for a public key, or string for some element or attribute content such as CSS style rules or SVG paths) shows that we have some more work ahead of us, but it shouldn't be taken as an excuse to map things we already have moved to a somewhat higher and more structured form (such as XML) to something lower. If we wanted, we could take all the XML Schema Datatypes and for all of them define how they denote octets, but that would clearly be big step backwards rather than moving forwards. Regards, Martin.
Received on Sunday, 3 August 2003 16:11:57 UTC