- From: Gavin Nicol <gtn@eps.inso.com>
- Date: Mon, 16 Jun 1997 07:19:44 -0400
- To: w3c-sgml-wg@w3.org
James writes: >If a character is outside the BMP and so requires 2 16-bit objects to >encode it in UTF-16, it's still one character not two. It should be >completely invisible to a DOM user whether a character is inside or outside >the BMP. An object model that pretended that a character outside the BMP >was two "characters" would, in my view, be totally broken. The result of >such an object model would be that many applications would fail to work >properly on characters outside the BMP. To which I can only heartily agree, and point out that this is precisely what James, myself, and others have been saying all along. >I'm not sure I agree with Gavin when he says that all that is needed is a >String type. I think you need a Character type as well. I suppose you >could say that a Character will be represented by a String containing a >single character, but I think it would be better to allow an individual DOM >language binding to choose whether to say, for that language, a Character >will be represented by a one-character string or by a separate data type. >For example, if I wanted to use the DOM for DSSSL, I would want DOM >characters to be represented as DSSSL characters not strings. The reason that I favored having only a String is simplicity (where a character becomes the smallest indivisible String). I have no objection whatever to also including an abstract Character class into the DOM, so long as it does remain abstract. My concept of a string is just a sequence of abstract characters (i.e. it could be a sequence of DSSSL characters).
Received on Monday, 16 June 1997 07:20:26 UTC