- From: Graham Klyne <gk@ninebynine.org>
- Date: Fri, 18 Jul 2003 10:45:29 +0100
- To: Martin Duerst <duerst@w3.org>
- Cc: w3c-i18n-ig@w3.org, Brian McBride <bwm@hplb.hpl.hp.com>, RDFCore Working Group <w3c-rdfcore-wg@w3.org>
Martin, what you're saying here about XML doesn't seem to fit with what you were saying earlier about text with markup. Particularly: [[ I.e. an XML literal denotes an XML fragment the same way an integer denotes an integer. ]] Here, you seem to say that an XML literal is *not* text with markup, but your original comments seemed to be focused on the disparities between treatment of XML literals and plain text. If we're to have a constructive discussion, I think we need to remain clear about the requirements we're trying to address. #g -- At 22:50 17/07/03 +0100, Brian McBride wrote: >and Martin's response > >Brian > >-------- Original Message -------- >Subject: Re: Ameliorating no change on XML Literal design >Date: Thu, 17 Jul 2003 15:08:43 -0400 >From: Martin Duerst <duerst@w3.org> >To: Brian McBride <bwm@hplb.hpl.hp.com>, RDF Core <w3c-rdf-core@w3.org> >CC: w3c-i18n-ig@w3.org > >At 17:30 03/07/17 +0100, Brian McBride wrote: > >>Martin further suggested that we consider changing the canonicalization >>algorithm to omit the conversation to utf 8. I pointed out that this has >>the benefit of avoiding false equals between similar plain and xml >>literals, but I agreed to raise it anyway. > >Some more notes on what Brian and me talked about. Not guaranteed >that everything makes sense, please feel free to comment. > >Brian said that in the current system, the lexical form of an XML literal >is a (non-canonicalized) string of characters, and the thing it denotes >is the UTF-8-encoded canonicalized version of that string. > >This is 180 degrees against what happens in internationalization, >and in contrast to xml:lang, is quite extensively explained in the >Character Model. The physical/electronic/whatever lower-level >representation is in terms of octets or other code units, and >the higher level (not necessarily highest level, of course) >representation is in terms of characters. > >The point that Brian mentiones above is a valid one, we would not >like to have equality between a string of characters representing >XML markup and a string of characters that by chance looks like >markup to be introduced via a back door. Brian explained to me >that the denotation does not explicitly carry the datatypes. >But still, it seems to me that the denotation "integer 11" and >the denotation "string '11'" should be different currently. >Then it would be easy to solve this particular problem (and to >hopefully bring quite a bit more clarity into the distinction >between plain strings and strings with markup) by saying that >an XML literal denotes the XML fragment that is represented by >the string of characters resulting from the exclusive canonicalization >(without the step of UTF-8 encoding) of [the relevant input]. > >I.e. an XML literal denotes an XML fragment the same way an >integer denotes an integer. > > >Regards, Martin. > > > ------------------- Graham Klyne <GK@NineByNine.org> PGP: 0FAA 69FF C083 000B A2E9 A131 01B9 1C7A DBCA CB5E
Received on Friday, 18 July 2003 06:45:48 UTC