W3C home > Mailing lists > Public > public-html@w3.org > November 2008

Re: Parsing problem with misnested tags

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Wed, 12 Nov 2008 14:46:31 -0500
Message-ID: <491B3297.9050605@mit.edu>
To: "Philip TAYLOR (Ret'd)" <P.Taylor@Rhul.Ac.Uk>
CC: Philip Taylor <pjt47@cam.ac.uk>, HTML WG <public-html@w3.org>

Philip TAYLOR (Ret'd) wrote:
>>   |     "A"
>>   |     <code>
>>   |       <pre>
>>   |         "B"
>>   |       "C"
> I assume (as you haven't shewn them explicitly) that
> there are no implied </...>s anywhere in that parse tree.

That's a DOM tree, not a parse tree.  It's show an HTMLPreElement with a 
single textnode child ("B"), and an HTMLCodeElement or whatever you want 
to call it, with two kids: the HTMLPreElement and a textnode ("C").

> A closure for an outer element must surely close all inner
> elements

Doing that blindly would break the web.  Consider the simple example of:

   <b>Bold <i>Bold and italic</b> Still italic, not bold</i> Normal font

Closing the <b> doesn't end the italicising, even though it's the 
"outer" element.  This behavior is interoperable across all major 
browsers, and significant number of sites depend on it.

> whether or not the specification requires
> that they be explicitly closed, as a normal part of
> the parser's error recovery procedure.

The error recovery procedure needs to be more complicated than you seem 
to think, if the parser is going to handle real-life web content.

>> This significantly breaks 
>> http://blogs.sun.com/bblfish/entry/rest_apis_must_be_hypertext 

> I'm not sure what semantics you are ascribing to "significantly"
> here : are you saying that http://blogs.sun.com/ is such a significant
> site that even if it outputs crap code (which it clearly does),
> browsers should bend over backwards to accommodate that crap code,
> or are you not making any value judgement concerning http://blogs.sun.com/
> but instead saying that html5lib and validator.nu both make a major
> error in their handling of its aberrant output ?

The latter, clearly (as in, the DOM is significantly different from what 
it needs to be to render the site as browsers interoperably render it).

Received on Wednesday, 12 November 2008 19:47:25 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:15:39 UTC