W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > June 1997

Re: I18N issue needs consideration

From: Gavin Nicol <gtn@eps.inso.com>
Date: Mon, 16 Jun 1997 07:19:44 -0400
Message-Id: <199706161119.HAA27553@nathaniel.eps.inso.com>
To: w3c-sgml-wg@w3.org
James writes:
>If a character is outside the BMP and so requires 2 16-bit objects to
>encode it in UTF-16, it's still one character not two.  It should be
>completely invisible to a DOM user whether a character is inside or outside
>the BMP.  An object model that pretended that a character outside the BMP
>was two "characters" would, in my view, be totally broken.  The result of
>such an object model would be that many applications would fail to work
>properly on characters outside the BMP.

To which I can only heartily agree, and point out that this is
precisely what James, myself, and others have been saying all along. 

>I'm not sure I agree with Gavin when he says that all that is needed is a
>String type.  I think you need a Character type as well.  I suppose you
>could say that a Character will be represented by a String containing a
>single character, but I think it would be better to allow an individual DOM
>language binding to choose whether to say, for that language, a Character
>will be represented by a one-character string or by a separate data type. 
>For example, if I wanted to use the DOM for DSSSL, I would want DOM
>characters to be represented as DSSSL characters not strings.

The reason that I favored having only a String is simplicity (where a
character becomes the smallest indivisible String). I have
no objection whatever to also including an abstract Character class
into the DOM, so long as it does remain abstract. My concept of a
string is just a sequence of abstract characters (i.e. it could be a
sequence of DSSSL characters).
Received on Monday, 16 June 1997 07:20:26 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:25:10 UTC