- From: David Brownell <david-b@pacbell.net>
- Date: Tue, 04 Jan 2000 19:47:12 -0800
- To: "H.Ozawa" <h-ozawa@hitachi-system.co.jp>
- Cc: www-dom@w3.org
Hello, Your description wasn't quite clear what you're doing, so it's hard to say just what was going wrong. For example, exactly which encoding line was used? I understand there are quite a few encodings used in Japan, not all of which are widely supported. I've seen ones like this work pretty consistently: <?xml version='1.0' encoding='EUC-JP'?> "H.Ozawa" wrote: > > Problem arises because most parsers do not treat 'encoding' attribute as > part of the <?xml?>. Note that parsing isn't a DOM issue; DOM just represents documents in memory. And the difference between a DOM document with Japanese text (or tags, or attributes, etc) and one with, say, English ones is just the contents of some strings, ones which the DOM won't have much reason to look at after the tree model is created. (The strings are invariably going to be encoded in UTF-16 or Unicode.) See my XML.com reviews of XML parsers, linked from the bottom of http://home.pacbell.net/david-b/xml/ These show parsers which handle Japanese encodings. Look at the very last section of the "Full Test Results" for any parser, and you'll see that many do a good job of parsing the XML documents (in Japanese) provided by Fuji Xerox. I think the parsers provided by corporations pass these tests pretty consistently, and the others didn't. Sun's, as one example, handled those Japanese test cases with no trouble. > As a concrete example: > 1. MS' s parser > I can't loadXML document containing Japanese tag names. I'm also > unable to specify encoding in the document > because the document isn't loaded yet. Which Microsoft parser? Their Java parser isn't really worth looking at; almost any other parser is miles ahead. But the IE5 "MSXML.DLL" is much better, even though it's got DTD troubles, and I've observed it to load documents with Japanese encodings. > 2. Oracle's parser > After createDocument(), I can't immediately issue setEncoding() > method. I have to issue the method after creating > a dummy node. Encoding is necessary to load Japanese XML documents > but encoding can not be specified on a null > document. Again, it's not clear what you're doing. The tests reported above show that there were some problems with an older release of Oracle's Java parser (2.0.0.2) and some Japanese encodings, but not with others. I suspect Oracle has a much more up-to-date version than that. > It would be nice if all attributes of processing instructions are > REQUIRED to be treated as part of a PI node itself. But an XML declaration, or a text declaration, isn't a PI. And even if it were modeled as one in DOM, that wouldn't help anything happening in a parser -- since by the time you get a DOM model of an XML document into memory, it's been parsed. > Thus, to change document encoding, I would only have to change > setEncoding() method parameter instead of adding new procedures. Transcoding documents isn't a DOM functionality, neither is any sort of setEncoding() method. So I think you can see why I'm puzzled exactly what you're doing, and what's going wrong! - Dave
Received on Tuesday, 4 January 2000 22:47:09 UTC