- From: <keshlam@us.ibm.com>
- Date: Tue, 4 Jan 2000 21:22:09 -0500
- To: "H.Ozawa" <h-ozawa@hitachi-system.co.jp>
- cc: www-dom@w3.org
Creating the document in memory shouldn't be a problem. All strings in the DOM, by definition, are expressed in UTF-16, which should be able to handle Japanese characters. As you point out, writing that document out and reading it back in are somewhat more complicated. The serializer and parser have to understand how to translate between UTF-16 and your preferred encoding, and you have to figure out how to tell them which encoding to use. >Thus, to change document encoding, I would only have to change >setEncoding() method parameter instead of adding new procedures Unfortunately, setEncoding() is not part of the standardized DOM API. The standard DOM does not have any representation of the XML Declaration (<?xml?>), and so does not store the encoding. Some tools express this as a Processing Instruction, but the XML specification and the Infoset both say that this isn't really the right answer. Some parsers make the encoding name available as a separate piece of information, and some serializers accept the encoding as a parameter along with the top-level DOM node; that's probably a better design than the PI approach. We're aware that this is probably an oversight in the DOM. It's on our Open Issues list for future DOM development, and I expect it will be addressed as part of the DOM Level 3 Serialization chapter. Meanwhile, I'm afraid you're stuck with nonportable solutions... and with hunting for parsers that support the encodings you want to use. (Obligatory marketing: Have you tried IBM's XML4J, or the Apache parser based on that code? Since the first version of that parser was written by a group in our Tokyo research center, I would be very surprised if it didn't include support for Japanese documents!) ______________________________________ Joe Kesselman / IBM Research
Received on Tuesday, 4 January 2000 21:22:29 UTC