- From: H.Ozawa <h-ozawa@hitachi-system.co.jp>
- Date: Wed, 05 Jan 2000 13:48:18 +0900
- To: www-dom@w3.org
Most parsers could handle XML document files creatED externally. The problem comes when I try to create a XML document directly (ie. no external file) in memory. I'm creating XML documents dynamically from form entries and from database query results. H.Ozawa h-ozawa@hitachi-system.co.jp ------------------------- David Brownell wrote: > Hello, > > Your description wasn't quite clear what you're doing, so it's > hard to say just what was going wrong. For example, exactly > which encoding line was used? I understand there are quite a few > encodings used in Japan, not all of which are widely supported. > > I've seen ones like this work pretty consistently: > > <?xml version='1.0' encoding='EUC-JP'?> > > "H.Ozawa" wrote: > > > > Problem arises because most parsers do not treat 'encoding' attribute as > > part of the <?xml?>. > > Note that parsing isn't a DOM issue; DOM just represents documents > in memory. And the difference between a DOM document with Japanese > text (or tags, or attributes, etc) and one with, say, English ones > is just the contents of some strings, ones which the DOM won't have > much reason to look at after the tree model is created. (The strings > are invariably going to be encoded in UTF-16 or Unicode.) > > See my XML.com reviews of XML parsers, linked from the bottom of > > http://home.pacbell.net/david-b/xml/ > > These show parsers which handle Japanese encodings. Look at the very > last section of the "Full Test Results" for any parser, and you'll see > that many do a good job of parsing the XML documents (in Japanese) > provided by Fuji Xerox. I think the parsers provided by corporations > pass these tests pretty consistently, and the others didn't. Sun's, > as one example, handled those Japanese test cases with no trouble. > > > As a concrete example: > > 1. MS' s parser > > I can't loadXML document containing Japanese tag names. I'm also > > unable to specify encoding in the document > > because the document isn't loaded yet. > > Which Microsoft parser? Their Java parser isn't really worth > looking at; almost any other parser is miles ahead. But the > IE5 "MSXML.DLL" is much better, even though it's got DTD troubles, > and I've observed it to load documents with Japanese encodings. > > > 2. Oracle's parser > > After createDocument(), I can't immediately issue setEncoding() > > method. I have to issue the method after creating > > a dummy node. Encoding is necessary to load Japanese XML documents > > but encoding can not be specified on a null > > document. > > Again, it's not clear what you're doing. The tests reported above > show that there were some problems with an older release of Oracle's > Java parser (2.0.0.2) and some Japanese encodings, but not with others. > I suspect Oracle has a much more up-to-date version than that. > > > It would be nice if all attributes of processing instructions are > > REQUIRED to be treated as part of a PI node itself. > > But an XML declaration, or a text declaration, isn't a PI. And > even if it were modeled as one in DOM, that wouldn't help anything > happening in a parser -- since by the time you get a DOM model of > an XML document into memory, it's been parsed. > > > Thus, to change document encoding, I would only have to change > > setEncoding() method parameter instead of adding new procedures. > > Transcoding documents isn't a DOM functionality, neither is any > sort of setEncoding() method. So I think you can see why I'm > puzzled exactly what you're doing, and what's going wrong! > > - Dave
Received on Tuesday, 4 January 2000 23:48:21 UTC