Re: Parsing problem with misnested tags

Philip TAYLOR (Ret'd) wrote:
>>   |     "A"
>>   |     <code>
>>   |       <pre>
>>   |         "B"
>>   |       "C"
> 
> I assume (as you haven't shewn them explicitly) that
> there are no implied </...>s anywhere in that parse tree.

That's a DOM tree, not a parse tree.  It's show an HTMLPreElement with a 
single textnode child ("B"), and an HTMLCodeElement or whatever you want 
to call it, with two kids: the HTMLPreElement and a textnode ("C").

> A closure for an outer element must surely close all inner
> elements

Doing that blindly would break the web.  Consider the simple example of:

   <b>Bold <i>Bold and italic</b> Still italic, not bold</i> Normal font

Closing the <b> doesn't end the italicising, even though it's the 
"outer" element.  This behavior is interoperable across all major 
browsers, and significant number of sites depend on it.

> whether or not the specification requires
> that they be explicitly closed, as a normal part of
> the parser's error recovery procedure.

The error recovery procedure needs to be more complicated than you seem 
to think, if the parser is going to handle real-life web content.

>> This significantly breaks 
>> http://blogs.sun.com/bblfish/entry/rest_apis_must_be_hypertext 
...

> I'm not sure what semantics you are ascribing to "significantly"
> here : are you saying that http://blogs.sun.com/ is such a significant
> site that even if it outputs crap code (which it clearly does),
> browsers should bend over backwards to accommodate that crap code,
> or are you not making any value judgement concerning http://blogs.sun.com/
> but instead saying that html5lib and validator.nu both make a major
> error in their handling of its aberrant output ?

The latter, clearly (as in, the DOM is significantly different from what 
it needs to be to render the site as browsers interoperably render it).

-Boris

Received on Wednesday, 12 November 2008 19:47:25 UTC