HTML normalization question wrt DOM

Vidur,

I have not seen any document describe a common way to normalize HTML
that is poorly formed w.r.t. the DOM. This seems important if all DHTML
clients are to respond to JavaScript in the same way.

For instance,

<p><b>one <i>two </b>three </i> four</p>

does not produce a valid DOM tree. I can see two ways of representing
this:
              <p>
               |
   -------------------------
   |           |           |
  <b>         <i>        four
   |           |
 -----       three
 |   |
one <i>
     |
    two


This gives proper style inheritance, but JavaScript access to <i> will
not be correct unless <i> remembers it is in multiple parts of the tree.

On the other hand:

              <p>
               |
   -------------------------
   |                       |
  <b>                    four
   |            
 ---------       
 |       |
one     <i>
         |
  ---------------
  |      |      |
 two    </b>  three


Gives proper style inheritance and proper JavaScript access, but results
in nodes under <b> that aren't really bold, and introduces end-tags to
the hierarchy, as well as bounding box calculation complexities.

Is this a question for the DOM working group? Do all clients need to
build a normalized DOM tree the same way? Or should clients do whatever
they think makes most sense, as long as the JavaScript behavior is the
same? Thst is, getting the inner/outer text works as expected, changing
the text color works as expected, etc.

David

-- 
David Mott, Network Computer Inc.
mott@nc.com    http://www.nc.com

Received on Tuesday, 13 January 1998 14:55:01 UTC