Re: Difference in DOM's from David Brownell on 2000-09-01 (www-dom@w3.org from July to September 2000)

From: David Brownell <david-b@pacbell.net>
Date: Fri, 1 Sep 2000 09:45:04 -0700
To: <lauren@sqwest.bc.ca>, <www-dom@w3.org>
Message-ID: <000f01c01433$faa04930$6500000a@brownell.org>

Lauren, since DOM doesn't include parsing APIs,
I don't know how you can claim "you'll get the
same trees" ... all APIs to turn *ML text into a
DOM tree are by definition proprietary at this
time (or at least must be promulgated by some
non-W3C group).

Admittedly there are some assertions floating
around in the text, but since there is no DOM
API for parsing to which they could apply, there
doesn't seem to be anything that could be tested
for purposes of saying "yes, this conforms to the
DOM L2 spec" ... or "no, this doesn't, because
it doesn't return the right results from this
specific API".

Agreed that malformed HTML is a horrific problem,
which was a significant motivator for XML!  :-)
XML was specified so that conformant "processors"
can be defined; but DOM (L1 or L2) can't satisfy
such requirements without a suitable API.

Given such a parsing API, including controls over
which of the various DOM options are desired in
the output tree (entities and refs: just say no!),
THEN one ought to expect isomorphic DOM trees to
get produced by different implementations ... for
DOM implementations that hook up to XML processors.

But without such an API, it can't ever happen, and
the answer to Mark's question is that he's expecting
too much out of DOM (L1 or L2; HTML or XML).

- Dave

p.s. As you know, this has long been one of my
    pet issues:  the DOM API is incomplete for
    what the marketplace perceives its role to be.

----- Original Message ----- 
From: "Lauren Wood" <lauren@sqwest.bc.ca>
To: <www-dom@w3.org>
Sent: Friday, 1 September, 2000 9:19 AM
Subject: Re: Difference in DOM's

> On 1 Sep 2000, at 11:28, Mark Brennan wrote:
> 
> > Say we have two different implementations of the org.w3c.dom one being
> > openXML's and another being some other package out there.  Now when we parse
> > an HTML file and produce a DOM out of it we are not assured that the two DOM
> > representations will have the same structured tree because of differences in
> > parsing and malformed html.  Wasn't the whole idea of the DOM to keep things
> > structured.  shouldn't these two parsers create the exact same DOM? 
> 
> If the HTML is valid, you will get the same trees. Given the number 
> of ways HTML can be invalid, there's no way of specifying the DOM 
> to make all processors give you the same trees under any 
> circumstances. So you'll have to clean up the HTML first, e.g., by 
> running it through an HTML clean-up script.
> 
> 
> Lauren
>

Received on Friday, 1 September 2000 12:45:22 UTC