Re: Creating Japanese Document in memory from H.Ozawa on 2000-01-05 (www-dom@w3.org from January to March 2000)

From: H.Ozawa <h-ozawa@hitachi-system.co.jp>
Date: Wed, 05 Jan 2000 13:37:48 +0900
To: www-dom@w3.org
Message-ID: <3872CA9C.C42376A7@hitachi-system.co.jp>

> Creating the document in memory shouldn't be a problem. All strings in the
> DOM, by definition, are expressed in UTF-16, which should be able to handle
> Japanese characters.
>

I would just need a OS that accept entries in UTF-16. Most Japanese data are
either
in Shift_JIS or EUC. It would be a waste of time to convert all literals to
UTF16.
Assume if XML document supported EBCDIC by default. Would you then convert
your ASCII to EBCDIC just to use the technology?

I have a 3 tier system. Client validates entries before sending to to the
server.
Server returns a XML document by creating XML document from query result.
Client system uses Windows which uses Shift_JIS. Server is either HP or SUN
server which uses Shift_JIS or EUC. These are given with most companies.
UTF16 is theorically excellent but as of now, it is logically a failing choice.

>
> >Thus, to change document encoding, I would only have to change
> >setEncoding() method parameter instead of adding new procedures
>
> Unfortunately, setEncoding() is not part of the standardized DOM API.
>
> The standard DOM does not have any representation of the XML Declaration
> (<?xml?>), and so does not store the encoding. Some tools express this as a
> Processing Instruction, but the XML specification and the Infoset both say
> that this isn't really the right answer.
>
> Some parsers make the encoding name available as a separate piece of
> information, and some serializers accept the encoding as a parameter along
> with the top-level DOM node; that's probably a better design than the PI
> approach.
>
> We're aware that this is probably an oversight in the DOM. It's on our Open
> Issues list for future DOM development, and I expect it will be addressed
> as part of the DOM Level 3  Serialization chapter.
>
> Meanwhile, I'm afraid you're stuck with nonportable solutions... and with
> hunting for parsers that support the encodings you want to use.
>

Well, I'm really not hunting for a parser. I've already contacted Oracle and MS

about these issues. I just don't want to go around every time asking them to
support Japanese. I think people from non-UTF8 will face a similar problem.
I think most English users do not see the urgency of this problem but it really

demands quick resolution.

Thought it better to resolve the problem at the root rather than at leaves.

>
> (Obligatory marketing: Have you tried IBM's XML4J, or the Apache parser
> based on that code? Since the first version of that parser was written by a
> group in our Tokyo research center, I would be very surprised if it didn't
> include support for Japanese documents!)

Might try Linux-Apache set. Have to check out JDBC support.

H.Ozawa
h-ozawa@hitachi-system.co.jp

Received on Tuesday, 4 January 2000 23:38:01 UTC