coping with overlapping elements in the DOM -Reply from Ray Whitmer on 1997-08-06 (www-dom@w3.org from July to September 1997)

From: Ray Whitmer <RAY@corel.com>
Date: Wed, 06 Aug 1997 10:29:28 -0600
To: www-dom@w3.org
Cc: HANKD@corel.com, RODS@corel.com, VERNON@corel.com
Message-Id: <s3e89b99.012@corel.com>

It is not clear to me what your option 1 would do -- whether it continues parsing as though no error occurred or aborts, and if it continues, whether the result is made good by ignoring the bad information, or if the resulting DOM is badly-formed.

I am strongly against anything that produces a poorly-formed (overlapping) object model in DOM.  FWIW, the example fix-up did not change the parent.  It eliminated one parent in a case where a tag had multiple parents (one at start, and one at the end).

I am also against deprecating any fixup layer, which just increases the unpredictability.  The alternatives I see are as follows:

1. Strip the bad tag out entirely and by wiping out the tags in question sending a strong message that bad HTML will not be tolerated.  But much HTML may be outside of the control of the one using it.

2. Completely reject the entire HTML once an error was discovered, again sending a strong message.  A seperate optional utility should be available to fix up broken HTML.

3.  Allow the implementation to incorporate the nice fixup capabilities.

I think 3 is good, and does not encourage the creation of broken HTML.  A DOM- or DTD-based HTML/XML editer should never save out broken HTML, so someone working in that environment should never have a problem.  HTML from other sources is generally outside of the control of the one using the DOM, so it will have to be fixed up at some point, and the fixup should be as painless as possible.

Ray Whitmer
ray@corel.com

>>> Lauren Wood <lauren@sqwest.bc.ca> 08/05/97 04:45pm >>>
One of the big problems in trying to come up with a reasonable 
specification for the DOM is trying to figure out how much we
should do to cope with broken HTML documents. Obviously
seriously broken documents will cause so many problems
that we just don't want to get into, but there are some 
classes of common mistakes that we can maybe allow.

One of these classes of mistakes is overlapping elements, 
of the form
<P><B>This is <EM> not </B> a good idea</EM></P>

We are thinking of defining nodes that would effectively change
the above example into
<P><B>This is <EM> not </EM></B><EM> a good idea</EM></P>

This does have effects on style sheets and other operations that
refer to the parent element, since the first EM element has a 
different parent in the two examples.

Since we don't really want to encourage people to write broken 
documents, there is also the problem of whether we should do 
anything for overlapping elements at all. The choices are:
1) don't do anything for overlapping elements
2) do something and deprecate it immediately, so it will be in level 
one but not level two
3) put it in without deprecating.

The DOM WG would like feedback on this issue. Which option do
you think the best?

thanks,

Lauren

Received on Wednesday, 6 August 1997 17:53:40 UTC