W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > July 2003

Re: [Fwd: Re: Ameliorating no change on XML Literal design]

From: Graham Klyne <gk@ninebynine.org>
Date: Fri, 18 Jul 2003 10:45:29 +0100
Message-Id: <5.1.0.14.2.20030718104116.02f63758@127.0.0.1>
To: Martin Duerst <duerst@w3.org>
Cc: w3c-i18n-ig@w3.org, Brian McBride <bwm@hplb.hpl.hp.com>, RDFCore Working Group <w3c-rdfcore-wg@w3.org>

Martin,

what you're saying here about XML doesn't seem to fit with what you were 
saying earlier about text with markup.  Particularly:

[[
I.e. an XML literal denotes an XML fragment the same way an
integer denotes an integer.
]]

Here, you seem to say that an XML literal is *not* text with markup, but 
your original comments seemed to be focused on the disparities between 
treatment of XML literals and plain text.  If we're to have a constructive 
discussion, I think we need to remain clear about the requirements we're 
trying to address.

#g
--

At 22:50 17/07/03 +0100, Brian McBride wrote:

>and Martin's response
>
>Brian
>
>-------- Original Message --------
>Subject: Re: Ameliorating no change on XML Literal design
>Date: Thu, 17 Jul 2003 15:08:43 -0400
>From: Martin Duerst <duerst@w3.org>
>To: Brian McBride <bwm@hplb.hpl.hp.com>, RDF Core <w3c-rdf-core@w3.org>
>CC: w3c-i18n-ig@w3.org
>
>At 17:30 03/07/17 +0100, Brian McBride wrote:
>
>>Martin further suggested that we consider changing the canonicalization 
>>algorithm to omit the conversation to utf 8.  I pointed out that this has 
>>the benefit of avoiding false equals between similar plain and xml 
>>literals, but I agreed to raise it anyway.
>
>Some more notes on what Brian and me talked about. Not guaranteed
>that everything makes sense, please feel free to comment.
>
>Brian said that in the current system, the lexical form of an XML literal
>is a (non-canonicalized) string of characters, and the thing it denotes
>is the UTF-8-encoded canonicalized version of that string.
>
>This is 180 degrees against what happens in internationalization,
>and in contrast to xml:lang, is quite extensively explained in the
>Character Model. The physical/electronic/whatever lower-level
>representation is in terms of octets or other code units, and
>the higher level (not necessarily highest level, of course)
>representation is in terms of characters.
>
>The point that Brian mentiones above is a valid one, we would not
>like to have equality between a string of characters representing
>XML markup and a string of characters that by chance looks like
>markup to be introduced via a back door. Brian explained to me
>that the denotation does not explicitly carry the datatypes.
>But still, it seems to me that the denotation "integer 11" and
>the denotation "string '11'" should be different currently.
>Then it would be easy to solve this particular problem (and to
>hopefully bring quite a bit more clarity into the distinction
>between plain strings and strings with markup) by saying that
>an XML literal denotes the XML fragment that is represented by
>the string of characters resulting from the exclusive canonicalization
>(without the step of UTF-8 encoding) of [the relevant input].
>
>I.e. an XML literal denotes an XML fragment the same way an
>integer denotes an integer.
>
>
>Regards,    Martin.
>
>
>

-------------------
Graham Klyne
<GK@NineByNine.org>
PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E
Received on Friday, 18 July 2003 06:45:48 EDT

This archive was generated by hypermail pre-2.1.9 : Wednesday, 3 September 2003 09:58:46 EDT