Re: Difference in DOM's

On 1 Sep 2000, at 11:28, Mark Brennan wrote:

> Say we have two different implementations of the org.w3c.dom one being
> openXML's and another being some other package out there.  Now when we parse
> an HTML file and produce a DOM out of it we are not assured that the two DOM
> representations will have the same structured tree because of differences in
> parsing and malformed html.  Wasn't the whole idea of the DOM to keep things
> structured.  shouldn't these two parsers create the exact same DOM? 

If the HTML is valid, you will get the same trees. Given the number 
of ways HTML can be invalid, there's no way of specifying the DOM 
to make all processors give you the same trees under any 
circumstances. So you'll have to clean up the HTML first, e.g., by 
running it through an HTML clean-up script.


Lauren

Received on Friday, 1 September 2000 12:18:59 UTC