- From: H.Ozawa <h-ozawa@hitachi-system.co.jp>
- Date: Wed, 05 Jan 2000 13:37:48 +0900
- To: www-dom@w3.org
> Creating the document in memory shouldn't be a problem. All strings in the > DOM, by definition, are expressed in UTF-16, which should be able to handle > Japanese characters. > I would just need a OS that accept entries in UTF-16. Most Japanese data are either in Shift_JIS or EUC. It would be a waste of time to convert all literals to UTF16. Assume if XML document supported EBCDIC by default. Would you then convert your ASCII to EBCDIC just to use the technology? I have a 3 tier system. Client validates entries before sending to to the server. Server returns a XML document by creating XML document from query result. Client system uses Windows which uses Shift_JIS. Server is either HP or SUN server which uses Shift_JIS or EUC. These are given with most companies. UTF16 is theorically excellent but as of now, it is logically a failing choice. > > >Thus, to change document encoding, I would only have to change > >setEncoding() method parameter instead of adding new procedures > > Unfortunately, setEncoding() is not part of the standardized DOM API. > > The standard DOM does not have any representation of the XML Declaration > (<?xml?>), and so does not store the encoding. Some tools express this as a > Processing Instruction, but the XML specification and the Infoset both say > that this isn't really the right answer. > > Some parsers make the encoding name available as a separate piece of > information, and some serializers accept the encoding as a parameter along > with the top-level DOM node; that's probably a better design than the PI > approach. > > We're aware that this is probably an oversight in the DOM. It's on our Open > Issues list for future DOM development, and I expect it will be addressed > as part of the DOM Level 3 Serialization chapter. > > Meanwhile, I'm afraid you're stuck with nonportable solutions... and with > hunting for parsers that support the encodings you want to use. > Well, I'm really not hunting for a parser. I've already contacted Oracle and MS about these issues. I just don't want to go around every time asking them to support Japanese. I think people from non-UTF8 will face a similar problem. I think most English users do not see the urgency of this problem but it really demands quick resolution. Thought it better to resolve the problem at the root rather than at leaves. > > (Obligatory marketing: Have you tried IBM's XML4J, or the Apache parser > based on that code? Since the first version of that parser was written by a > group in our Tokyo research center, I would be very surprised if it didn't > include support for Japanese documents!) Might try Linux-Apache set. Have to check out JDBC support. H.Ozawa h-ozawa@hitachi-system.co.jp
Received on Tuesday, 4 January 2000 23:38:01 UTC