W3C home > Mailing lists > Public > www-dom@w3.org > January to March 2001

Re: I want CData, but Dom gave me an entity!

From: Philippe Le Hegaret <plh@w3.org>
Date: Tue, 20 Mar 2001 16:28:57 -0500
Message-ID: <3AB7CB99.D97DD72D@w3.org>
To: Clay McCoy <clay@swordmicro.com>
Cc: www-dom@w3.org
Clay McCoy wrote:
> I am writing software where I use the w3c dom parser on some xml and then put it
> into some containter classes for ease of use in the program.  Then the program
> can simply use these classes to manipulate the data.  Once the program is done I
> reconstruct a Dom document from the container classes and write this out to a
> new modified xml document.  I am new to this and if there is a more streamlined
> approach then I would be excited to hear about it.
>     Well, the program that I described works quite well except for a few
> quirks.  The main one involves entities, specifically the '&'.  When the parser
> would encounter one of these it would break.  To fix this, the program that
> generates and sends me the xml added CData tags around every text field where
> &'s might be encountered.  This may have not been the best solution and I would
> love to hear of better ones.  This did fix the problem with the data coming in.
> It is now parsed into the Dom, and from there into the container classes just
> fine, even if a & is encoutnered.  But when I send the information back out,
> from the containers to a dom, and from the dom to an xml document, the '&' is
> represented by an "&amp;" tag.  It looks like the text is broken up into three
> children in the dom.  The three children are the text before the &, the & itself
> represented as some strange character string, and the text after the &.  Other
> programs that look at this xml later expect only one child and don't see the
> rest of the children.  Therefore they only see the text up to the &.  What
> shoudl I do? Shoudl I write the code to search for extra children somehow and
> compile them all into one line of text?  This seems liek a lot of work to do it
> in every case, and I am not sure how to implement it.  Or is there some way to
> specify that it is written out with CData tags around it?  This seems like it
> woudl be easiest.
>     Something else that I have noticed when lookign at the dom while debugging
> the program is that ther are a lot of extra children that I have to skip over to
> get to the data that I actually want.  What are those nodes there for, and what
> is the best way to deal with them?

Your issue looks more a DOM implementation issue than a DOM spec one to me. Your
DOM implementation is misconverting the CDATA section node "<![CDATA[&]]>" to an
entity reference node "&amp;". I suggest you to contact the implementers of your
DOM implementation.

Note that the DOM relies on the underlying XML Processor to have all
information items so you might not always have them in memory depending on your
DOM implementation/XML Processor.

Received on Tuesday, 20 March 2001 16:29:01 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 20 October 2015 10:46:08 UTC