Re: Creating Japanese Document in memory

Most parsers could handle XML document files creatED externally.
The problem comes when I try to create a XML document directly
(ie. no external file) in memory. I'm creating XML documents dynamically
from form entries and from database query results.

H.Ozawa
h-ozawa@hitachi-system.co.jp

-------------------------
David Brownell wrote:

> Hello,
>
> Your description wasn't quite clear what you're doing, so it's
> hard to say just what was going wrong.  For example, exactly
> which encoding line was used?  I understand there are quite a few
> encodings used in Japan, not all of which are widely supported.
>
> I've seen ones like this work pretty consistently:
>
>         <?xml version='1.0' encoding='EUC-JP'?>
>
> "H.Ozawa" wrote:
> >
> > Problem arises because most parsers do not treat 'encoding' attribute as
> > part of the <?xml?>.
>
> Note that parsing isn't a DOM issue; DOM just represents documents
> in memory.  And the difference between a DOM document with Japanese
> text (or tags, or attributes, etc) and one with, say, English ones
> is just the contents of some strings, ones which the DOM won't have
> much reason to look at after the tree model is created.  (The strings
> are invariably going to be encoded in UTF-16 or Unicode.)
>
> See my XML.com reviews of XML parsers, linked from the bottom of
>
>         http://home.pacbell.net/david-b/xml/
>
> These show parsers which handle Japanese encodings. Look at the very
> last section of the "Full Test Results" for any parser, and you'll see
> that many do a good job of parsing the XML documents (in Japanese)
> provided by Fuji Xerox.  I think the parsers provided by corporations
> pass these tests pretty consistently, and the others didn't.  Sun's,
> as one example, handled those Japanese test cases with no trouble.
>
> > As a concrete example:
> > 1. MS' s parser
> >     I can't loadXML document containing Japanese tag names. I'm also
> > unable to specify encoding in the document
> >     because the document isn't loaded yet.
>
> Which Microsoft parser?  Their Java parser isn't really worth
> looking at; almost any other parser is miles ahead.  But the
> IE5 "MSXML.DLL" is much better, even though it's got DTD troubles,
> and I've observed it to load documents with Japanese encodings.
>
> > 2. Oracle's parser
> >     After createDocument(), I can't immediately issue setEncoding()
> > method. I have to issue the method after creating
> >     a dummy node. Encoding is necessary to load Japanese XML documents
> > but encoding can not be specified on a null
> >     document.
>
> Again, it's not clear what you're doing.  The tests reported above
> show that there were some problems with an older release of Oracle's
> Java parser (2.0.0.2) and some Japanese encodings, but not with others.
> I suspect Oracle has a much more up-to-date version than that.
>
> > It would be nice if all attributes of processing instructions are
> > REQUIRED to be treated as part of a PI node itself.
>
> But an XML declaration, or a text declaration, isn't a PI.  And
> even if it were modeled as one in DOM, that wouldn't help anything
> happening in a parser -- since by the time you get a DOM model of
> an XML document into memory, it's been parsed.
>
> > Thus, to change document encoding, I would only have to change
> > setEncoding() method parameter instead of adding new procedures.
>
> Transcoding documents isn't a DOM functionality, neither is any
> sort of setEncoding() method.  So I think you can see why I'm
> puzzled exactly what you're doing, and what's going wrong!
>
> - Dave

Received on Tuesday, 4 January 2000 23:48:21 UTC