Re: Parsing problem with misnested tags from Jonas Sicking on 2008-11-13 (public-html@w3.org from November 2008)

From: Jonas Sicking <jonas@sicking.cc>
Date: Thu, 13 Nov 2008 00:07:04 -0800
To: "Boris Zbarsky" <bzbarsky@mit.edu>
Cc: "Philip TAYLOR (Ret'd)" <P.Taylor@rhul.ac.uk>, "Philip Taylor" <pjt47@cam.ac.uk>, "HTML WG" <public-html@w3.org>
Message-ID: <63df84f0811130007s40c28f16y974dd86aa2c23a60@mail.gmail.com>

On Wed, Nov 12, 2008 at 11:46 AM, Boris Zbarsky <bzbarsky@mit.edu> wrote:
>> I'm not sure what semantics you are ascribing to "significantly"
>> here : are you saying that http://blogs.sun.com/ is such a significant
>> site that even if it outputs crap code (which it clearly does),
>> browsers should bend over backwards to accommodate that crap code,
>> or are you not making any value judgement concerning http://blogs.sun.com/
>> but instead saying that html5lib and validator.nu both make a major
>> error in their handling of its aberrant output ?
>
> The latter, clearly (as in, the DOM is significantly different from what it
> needs to be to render the site as browsers interoperably render it).

Right, there seems to be a bug in the HTML5 parsing algorithm here. If
current browsers all render

A<code><pre>B</code></pre>C

with the 'C' not being inside neither a <code> nor a <pre>, then it
really seems like HTML5 needs to produce a DOM where the 'C' is
outside any such elements.

The HTML5 parsing algorithm specifically tries to parse the invalid
tag soup that exists on many (most) pages on the internet today. It
further tries to parse it such that when an appropriate CSS stylesheet
is applied such pages will render as they do in todays browsers. This
is one of the core principles behind HTML5, so core in fact that it is
in our charter.

/ Jonas

Received on Thursday, 13 November 2008 08:07:40 UTC