Re: I want CData, but Dom gave me an entity!

Clay McCoy wrote:
> 
> I am writing software where I use the w3c dom parser on some xml and then put it
> into some containter classes for ease of use in the program.  Then the program
> can simply use these classes to manipulate the data.  Once the program is done I
> reconstruct a Dom document from the container classes and write this out to a
> new modified xml document.  I am new to this and if there is a more streamlined
> approach then I would be excited to hear about it.
>     Well, the program that I described works quite well except for a few
> quirks.  The main one involves entities, specifically the '&'.  When the parser
> would encounter one of these it would break.  To fix this, the program that
> generates and sends me the xml added CData tags around every text field where
> &'s might be encountered.  This may have not been the best solution and I would
> love to hear of better ones.  This did fix the problem with the data coming in.
> It is now parsed into the Dom, and from there into the container classes just
> fine, even if a & is encoutnered.  But when I send the information back out,
> from the containers to a dom, and from the dom to an xml document, the '&' is
> represented by an "&" tag.  It looks like the text is broken up into three
> children in the dom.  The three children are the text before the &, the & itself
> represented as some strange character string, and the text after the &.  Other
> programs that look at this xml later expect only one child and don't see the
> rest of the children.  Therefore they only see the text up to the &.  What
> shoudl I do? Shoudl I write the code to search for extra children somehow and
> compile them all into one line of text?  This seems liek a lot of work to do it
> in every case, and I am not sure how to implement it.  Or is there some way to
> specify that it is written out with CData tags around it?  This seems like it
> woudl be easiest.
>     Something else that I have noticed when lookign at the dom while debugging
> the program is that ther are a lot of extra children that I have to skip over to
> get to the data that I actually want.  What are those nodes there for, and what
> is the best way to deal with them?

Your issue looks more a DOM implementation issue than a DOM spec one to me. Your
DOM implementation is misconverting the CDATA section node "<![CDATA[&]]>" to an
entity reference node "&amp;". I suggest you to contact the implementers of your
DOM implementation.

Note that the DOM relies on the underlying XML Processor to have all
entities/CDATA
information items so you might not always have them in memory depending on your
DOM implementation/XML Processor.

Philippe

Received on Tuesday, 20 March 2001 16:29:01 UTC