- From: Brian McBride <bwm@hplb.hpl.hp.com>
- Date: Thu, 17 Jul 2003 22:50:48 +0100
- To: RDFCore Working Group <w3c-rdfcore-wg@w3.org>
- CC: Martin Duerst <duerst@w3.org>, w3c-i18n-ig@w3.org
and Martin's response Brian -------- Original Message -------- Subject: Re: Ameliorating no change on XML Literal design Date: Thu, 17 Jul 2003 15:08:43 -0400 From: Martin Duerst <duerst@w3.org> To: Brian McBride <bwm@hplb.hpl.hp.com>, RDF Core <w3c-rdf-core@w3.org> CC: w3c-i18n-ig@w3.org At 17:30 03/07/17 +0100, Brian McBride wrote: >Martin further suggested that we consider changing the canonicalization >algorithm to omit the conversation to utf 8. I pointed out that this has >the benefit of avoiding false equals between similar plain and xml >literals, but I agreed to raise it anyway. Some more notes on what Brian and me talked about. Not guaranteed that everything makes sense, please feel free to comment. Brian said that in the current system, the lexical form of an XML literal is a (non-canonicalized) string of characters, and the thing it denotes is the UTF-8-encoded canonicalized version of that string. This is 180 degrees against what happens in internationalization, and in contrast to xml:lang, is quite extensively explained in the Character Model. The physical/electronic/whatever lower-level representation is in terms of octets or other code units, and the higher level (not necessarily highest level, of course) representation is in terms of characters. The point that Brian mentiones above is a valid one, we would not like to have equality between a string of characters representing XML markup and a string of characters that by chance looks like markup to be introduced via a back door. Brian explained to me that the denotation does not explicitly carry the datatypes. But still, it seems to me that the denotation "integer 11" and the denotation "string '11'" should be different currently. Then it would be easy to solve this particular problem (and to hopefully bring quite a bit more clarity into the distinction between plain strings and strings with markup) by saying that an XML literal denotes the XML fragment that is represented by the string of characters resulting from the exclusive canonicalization (without the step of UTF-8 encoding) of [the relevant input]. I.e. an XML literal denotes an XML fragment the same way an integer denotes an integer. Regards, Martin.
Received on Thursday, 17 July 2003 17:51:34 UTC