Re: Creating Japanese Document in memory from keshlam@us.ibm.com on 2000-01-05 (www-dom@w3.org from January to March 2000)

From: <keshlam@us.ibm.com>
Date: Wed, 5 Jan 2000 10:27:11 -0500
To: www-dom@w3.org
Message-ID: <8525685D.0054D896.00@D51MTA03.pok.ibm.com>

For what it's worth, we have had some discussions with the I18N group about
how to improve the DOM's internationalization support. The results have
been somewhat inconclusive; the two groups have had a bit of trouble
reaching a consensus on what functions are needed and how to organize them.
"I18N indexing" is listed as one of the issues to be revisited in future
versions of the DOM, but there are probably other functions that would be
needed as well.

My own personal guess is that what's needed is a general
internationalized-text-string datatype, which might then be slotted into
the DOM as an implementation of DOMString as well as being usable
independently. But I suspect that datatype should be designed by the I18N
group rather than the DOM group; they're the ones who have the relevant
expertise.


Re the problem of multiple encodings: In fact, if you're using a DOM on an
EBCDIC-based system, you _are_ expected to translate from EBCDIC to UTF-16.
ASCII-based environments may have an unfair advantage, but they still have
to add the leading 0 byte to every character, so all they really avoid is a
table look-up.

However... character-set translation is something that the DOM Level 3
Serialization chapter will have to deal with, at least implicitly; it might
make sense to ask that the encoding conversion routines be exposed as
subroutines.

I'll record this as an open issue. That doesn't guarantee that we'll
address it, but at least it'll keep us from forgetting about it.
______________________________________
Joe Kesselman  / IBM Research

Received on Wednesday, 5 January 2000 10:27:59 UTC