W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > July 2003

Re: [Fwd: Re: Ameliorating no change on XML Literal design]

From: Graham Klyne <gk@ninebynine.org>
Date: Fri, 18 Jul 2003 10:45:29 +0100
Message-Id: <>
To: Martin Duerst <duerst@w3.org>
Cc: w3c-i18n-ig@w3.org, Brian McBride <bwm@hplb.hpl.hp.com>, RDFCore Working Group <w3c-rdfcore-wg@w3.org>


what you're saying here about XML doesn't seem to fit with what you were 
saying earlier about text with markup.  Particularly:

I.e. an XML literal denotes an XML fragment the same way an
integer denotes an integer.

Here, you seem to say that an XML literal is *not* text with markup, but 
your original comments seemed to be focused on the disparities between 
treatment of XML literals and plain text.  If we're to have a constructive 
discussion, I think we need to remain clear about the requirements we're 
trying to address.


At 22:50 17/07/03 +0100, Brian McBride wrote:

>and Martin's response
>-------- Original Message --------
>Subject: Re: Ameliorating no change on XML Literal design
>Date: Thu, 17 Jul 2003 15:08:43 -0400
>From: Martin Duerst <duerst@w3.org>
>To: Brian McBride <bwm@hplb.hpl.hp.com>, RDF Core <w3c-rdf-core@w3.org>
>CC: w3c-i18n-ig@w3.org
>At 17:30 03/07/17 +0100, Brian McBride wrote:
>>Martin further suggested that we consider changing the canonicalization 
>>algorithm to omit the conversation to utf 8.  I pointed out that this has 
>>the benefit of avoiding false equals between similar plain and xml 
>>literals, but I agreed to raise it anyway.
>Some more notes on what Brian and me talked about. Not guaranteed
>that everything makes sense, please feel free to comment.
>Brian said that in the current system, the lexical form of an XML literal
>is a (non-canonicalized) string of characters, and the thing it denotes
>is the UTF-8-encoded canonicalized version of that string.
>This is 180 degrees against what happens in internationalization,
>and in contrast to xml:lang, is quite extensively explained in the
>Character Model. The physical/electronic/whatever lower-level
>representation is in terms of octets or other code units, and
>the higher level (not necessarily highest level, of course)
>representation is in terms of characters.
>The point that Brian mentiones above is a valid one, we would not
>like to have equality between a string of characters representing
>XML markup and a string of characters that by chance looks like
>markup to be introduced via a back door. Brian explained to me
>that the denotation does not explicitly carry the datatypes.
>But still, it seems to me that the denotation "integer 11" and
>the denotation "string '11'" should be different currently.
>Then it would be easy to solve this particular problem (and to
>hopefully bring quite a bit more clarity into the distinction
>between plain strings and strings with markup) by saying that
>an XML literal denotes the XML fragment that is represented by
>the string of characters resulting from the exclusive canonicalization
>(without the step of UTF-8 encoding) of [the relevant input].
>I.e. an XML literal denotes an XML fragment the same way an
>integer denotes an integer.
>Regards,    Martin.

Graham Klyne
PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E
Received on Friday, 18 July 2003 06:45:48 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:54:06 UTC