- From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
- Date: Sat, 21 Apr 2007 19:55:03 +1000
- To: David Woolley <forums@david-woolley.me.uk>
- CC: www-html@w3.org
David Woolley wrote: > > Lachlan Hunt wrote: >> Not any more. Although it's note quite complete, HTML5 is defining >> the parsing requirements of HTML on the web. > > And a horrible set of ad hoc rules it is. It's basically a proper tree > type grammar with a set of error recovery rules for producing a > renderable tree from almost every invalid input. Yeah, it's designed to handle real world HTML. The WHATWG places interoperability with existing content above syntactic purity. > Maybe what I should have offered is three categories: > > - tag soup; > - HTML5 with *no* parse errors; > - SGML based. > > I'm assuming that HTML5 *with* parse errors can produce all the > productions allowed by tag soup. In a quick scan, I couldn't tell what, > if any difference there is between HTML5 without parse errors and SGML > based. In any case, it seems to be quite close to SGML based. There are pleny of differences between HTML5 parsing and HTML4's SGML parsing. Here's just a few: In HTML5, <br/> is conforming and equivalent to <br> In HTML4, <br/> is equivalent to <br>> In HTML5, characters like '/' can occur in unquoted attribute values e.g. <a href=http://example.com/>link</a> In HTML4, that would be equivalent to: <a href="http:"></a>example.com/>link</a> In HTML5, <noscript> is parsed differently depending on whether or not script is enabled. There's plenty more differences, but that should be enough to illustrate the point. -- Lachlan Hunt http://lachy.id.au/
Received on Saturday, 21 April 2007 09:55:40 UTC