Re: I18N issue needs consideration from Gavin Nicol on 1997-06-16 (w3c-sgml-wg@w3.org from June 1997)

From: Gavin Nicol <gtn@eps.inso.com>
Date: Mon, 16 Jun 1997 07:02:35 -0400
To: w3c-sgml-wg@w3.org
Message-Id: <199706161102.HAA27545@nathaniel.eps.inso.com>

>Gavin Nicol writes:
>> 
>> And as you know, I suggested we just have the DOM to say "string",
>> and nothing more. This is perfectly reasonable, I beleive.
>> 
>
>Gavin,
>
>This is the second time I've seen a message where you've made a statement like
>this.  I'm curious about what your thinking is.  It would seem to me that to
>meet the stated goals of the DOM, i.e. to have consistent, portable scripts
>manipulating documents, and in particular text and maybe even
>attribute values, that you would have to be a little more concrete than that.
>
>For example, if a script is iterating or counting the characters in a text
>object that was retrieved from the DOM, doesn't the result depend on the
>encoding of the characters in the text object as presented by the DOM (which
>may be different from their representation internally)?  If the DOM doesn't
>specify a more specific encoding, doesn't it open the way for one
>implementation to say that it uses UTF-8 encoding for text content returned
>from the DOM, and another say that it uses Unicode code points, and a third
>DOM implementation to have its strings composed of 31 bit characters?  Won't
>the scripts executing on the different implementations have radically
>different behavior?
>
>Can you help me to understand why you don't think this is a problem, i.e. how
>to finesse away this concern?

I'll start by turning the question around, and asking what "iterating
or counting the characters" means to you, and why you would want to do
it.

The concept of a string to me is "an ordered sequence of
characters". Given that you do have an ordered sequence of characters
(a string), iteration over the string should return each character
component in turn. I would argue that the return value should be a
string itself, where iteration over that string, returns a string
identical to string being iterated over.

Received on Monday, 16 June 1997 07:03:21 UTC