- From: James Graham <jg307@cam.ac.uk>
- Date: Fri, 04 Jul 2008 14:34:32 +0100
- To: Jamie Lokier <jamie@shareable.org>
- CC: public-html@w3.org
Jamie Lokier wrote: > Last time I saw some of the HTML parsing differences, I read the HTML5 > comments where folk were devising new, robust parsing algorithms that > would produce a DOM from tag soup, but seemingly based on choosing a > neat algorithm to get a sensible result. That put me off writing up > and contributing my observations, as I figured that meant HTML5's plan > wasn't specifically to model browser compatibility behaviour in the > syntax department. I didn't have the energy to push that, study the > differences, write it up, if it wasn't on the agenda anyway. I think you have been misled; the parsing algorithm has been closely modeled in existing browser behavior. It is true that it has not slavishly copied every detail of every browser; for example the HTML5 algorithm always produces a tree-like DOM whilst IE does not. However in the circumstances where IE produces non-tree-like DOMs other leading browsers do saner things; the non-tree-likeness is not required for web-compat. This allows HTML5 to base this part of the parsing spec on one of the other browsers (in particular the HTML5 spec is rather webkit-like in this area). Similarly the spec does not copy the Firefox 2 behavior when encountering unknown elements (Firefox 2 makes all unknown elements behave like inline elements; this is incompatible with ever introducing new block level elements; Firefox 3 has changed the behavior here). On the other hand, where all 4 major browsers agree on a parsing behavior, it is taken as a sign that this behavior is needed for web compatibility and so HTML5 has adopted the behavior. If there are any cases where the algorithm is incompatible with reality, the algorithm will be changed (inevitably this will happen as browsers start to implement more of the HTML5 algorithm). For a comparison between the behavior of existing browsers and the behavior of the HTML5 algorithm, you might find some of the following useful: Live DOM viewer[1] (displays the DOM created by the current browser given some HTML source) html5lib parse tree viewer[2] and validator.nu parse tree viewer[3] show the HTML5 behavior (modulo bugs) given HTML source [1] http://software.hixie.ch/utilities/js/live-dom-viewer/ [2] http://james.html5.org/parsetree.html [3] http://parsetree.validator.nu/ -- "Mixed up signals Bullet train People snuffed out in the brutal rain" --Conner Oberst
Received on Friday, 4 July 2008 13:35:13 UTC