- From: Philippe Le Hegaret <plh@w3.org>
- Date: Tue, 20 Mar 2001 16:28:57 -0500
- To: Clay McCoy <clay@swordmicro.com>
- Cc: www-dom@w3.org
Clay McCoy wrote: > > I am writing software where I use the w3c dom parser on some xml and then put it > into some containter classes for ease of use in the program. Then the program > can simply use these classes to manipulate the data. Once the program is done I > reconstruct a Dom document from the container classes and write this out to a > new modified xml document. I am new to this and if there is a more streamlined > approach then I would be excited to hear about it. > Well, the program that I described works quite well except for a few > quirks. The main one involves entities, specifically the '&'. When the parser > would encounter one of these it would break. To fix this, the program that > generates and sends me the xml added CData tags around every text field where > &'s might be encountered. This may have not been the best solution and I would > love to hear of better ones. This did fix the problem with the data coming in. > It is now parsed into the Dom, and from there into the container classes just > fine, even if a & is encoutnered. But when I send the information back out, > from the containers to a dom, and from the dom to an xml document, the '&' is > represented by an "&" tag. It looks like the text is broken up into three > children in the dom. The three children are the text before the &, the & itself > represented as some strange character string, and the text after the &. Other > programs that look at this xml later expect only one child and don't see the > rest of the children. Therefore they only see the text up to the &. What > shoudl I do? Shoudl I write the code to search for extra children somehow and > compile them all into one line of text? This seems liek a lot of work to do it > in every case, and I am not sure how to implement it. Or is there some way to > specify that it is written out with CData tags around it? This seems like it > woudl be easiest. > Something else that I have noticed when lookign at the dom while debugging > the program is that ther are a lot of extra children that I have to skip over to > get to the data that I actually want. What are those nodes there for, and what > is the best way to deal with them? Your issue looks more a DOM implementation issue than a DOM spec one to me. Your DOM implementation is misconverting the CDATA section node "<![CDATA[&]]>" to an entity reference node "&". I suggest you to contact the implementers of your DOM implementation. Note that the DOM relies on the underlying XML Processor to have all entities/CDATA information items so you might not always have them in memory depending on your DOM implementation/XML Processor. Philippe
Received on Tuesday, 20 March 2001 16:29:01 UTC