- From: James Clark <jjc@jclark.com>
- Date: Sat, 14 Jun 1997 12:32:25 +0700
- To: <w3c-sgml-wg@w3.org>
> For example, if a script is iterating or counting the characters in a text > object that was retrieved from the DOM, doesn't the result depend on the > encoding of the characters in the text object as presented by the DOM (which > may be different from their representation internally)? If the DOM doesn't > specify a more specific encoding, doesn't it open the way for one > implementation to say that it uses UTF-8 encoding for text content returned > from the DOM, and another say that it uses Unicode code points, and a third > DOM implementation to have its strings composed of 31 bit characters? Won't > the scripts executing on the different implementations have radically different > behavior? If a script is supposed to be iterating over *characters* then the encoding of the characters is completely irrelevant. Whether a character is encoded as UTF-8 or UTF-16 or UCS-4, it's still a single character. Iterating over a sequence of characters is not the same as iterating over the objects that encode the characters (bytes or 16-bit words or whatever). If a character is outside the BMP and so requires 2 16-bit objects to encode it in UTF-16, it's still one character not two. It should be completely invisible to a DOM user whether a character is inside or outside the BMP. An object model that pretended that a character outside the BMP was two "characters" would, in my view, be totally broken. The result of such an object model would be that many applications would fail to work properly on characters outside the BMP. I'm not sure I agree with Gavin when he says that all that is needed is a String type. I think you need a Character type as well. I suppose you could say that a Character will be represented by a String containing a single character, but I think it would be better to allow an individual DOM language binding to choose whether to say, for that language, a Character will be represented by a one-character string or by a separate data type. For example, if I wanted to use the DOM for DSSSL, I would want DOM characters to be represented as DSSSL characters not strings. James
Received on Saturday, 14 June 1997 01:39:15 UTC