Re: Microsoft's "I mean it" content-type parameter

Jamie Lokier wrote:

> Last time I saw some of the HTML parsing differences, I read the HTML5
> comments where folk were devising new, robust parsing algorithms that
> would produce a DOM from tag soup, but seemingly based on choosing a
> neat algorithm to get a sensible result.  That put me off writing up
> and contributing my observations, as I figured that meant HTML5's plan
> wasn't specifically to model browser compatibility behaviour in the
> syntax department.  I didn't have the energy to push that, study the
> differences, write it up, if it wasn't on the agenda anyway.

I think you have been misled; the parsing algorithm has been closely 
modeled in existing browser behavior. It is true that it has not 
slavishly copied every detail of every browser; for example the HTML5 
algorithm always produces a tree-like DOM whilst IE does not. However in 
the circumstances where IE produces non-tree-like DOMs other leading 
browsers do saner things; the non-tree-likeness is not required for 
web-compat. This allows HTML5 to base this part of the parsing spec on 
one of the other browsers (in particular the HTML5 spec is rather 
webkit-like in this area).  Similarly the spec does not copy the Firefox 
2 behavior when encountering unknown elements (Firefox 2 makes all 
unknown elements behave like inline elements; this is incompatible with 
ever introducing new block level elements; Firefox 3 has changed the 
behavior here).

On the other hand, where all 4 major browsers agree on a parsing 
behavior, it is taken as a sign that this behavior is needed for web 
compatibility and so HTML5 has adopted the behavior. If there are any 
cases where the algorithm is incompatible with reality, the algorithm 
will be changed (inevitably this will happen as browsers start to 
implement more of the HTML5 algorithm).



For a comparison between the behavior of existing browsers and the 
behavior of the HTML5 algorithm, you might find some of the following 
useful:

Live DOM viewer[1] (displays the DOM created by the current browser 
given some HTML source)

html5lib parse tree viewer[2] and validator.nu parse tree viewer[3] show 
the HTML5 behavior (modulo bugs) given HTML source

[1] http://software.hixie.ch/utilities/js/live-dom-viewer/
[2] http://james.html5.org/parsetree.html
[3] http://parsetree.validator.nu/

-- 
"Mixed up signals
Bullet train
People snuffed out in the brutal rain"
--Conner Oberst

Received on Friday, 4 July 2008 13:35:13 UTC