Re: Parsing problem with misnested tags

Boris Zbarsky wrote:

> That's a DOM tree, not a parse tree.  It's show an HTMLPreElement with a 
> single textnode child ("B"), and an HTMLCodeElement or whatever you want 
> to call it, with two kids: the HTMLPreElement and a textnode ("C").

OK, so the indentation is intended to denote closure;
this was unclear (to me).

>> A closure for an outer element must surely close all inner
>> elements
> 
> Doing that blindly would break the web.  Consider the simple example of:
> 
>   <b>Bold <i>Bold and italic</b> Still italic, not bold</i> Normal font
> 
> Closing the <b> doesn't end the italicising, even though it's the 
> "outer" element.  This behavior is interoperable across all major 
> browsers, and significant number of sites depend on it.

You may be correct (I'm referring to the "significant
number of sites" here), but is there any evidence to suggest
that an approximately equal number of sites do not assume
exactly the converse ?  It is certainly not clear /a priori/
that the italicisation should continue, nor that that was
the intended behaviour.
> 
>> whether or not the specification requires
>> that they be explicitly closed, as a normal part of
>> the parser's error recovery procedure.
> 
> The error recovery procedure needs to be more complicated than you seem 
> to think, if the parser is going to handle real-life web content.

 From a purely personal perspective, I believe that
handling real-life web content that is /wrong/ is
at the very tail of the distribution of the criteria
that we should be considering.

> The latter, clearly (as in, the DOM is significantly different from what 
> it needs to be to render the site as browsers interoperably render it).

I don't know about "clearly" : that may well have been
my namesake's intention and meaning, but I for one
found the wording sufficiently unclear that I thought
that clarification (from the author) could usefully
be sought.

Philip TAYLOR

Received on Wednesday, 12 November 2008 21:17:31 UTC