Re: [XBL Primer] new scenarios from Lachlan Hunt on 2007-04-21 (www-html@w3.org from April 2007)

From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Date: Sat, 21 Apr 2007 19:55:03 +1000
To: David Woolley <forums@david-woolley.me.uk>
CC: www-html@w3.org
Message-ID: <4629DF77.1030701@lachy.id.au>

David Woolley wrote:
> 
> Lachlan Hunt wrote:
>> Not any more.  Although it's note quite complete, HTML5 is defining 
>> the parsing requirements of HTML on the web.
> 
> And a horrible set of ad hoc rules it is. It's basically a proper tree
> type grammar with a set of error recovery rules for producing a
> renderable tree from almost every invalid input.

Yeah, it's designed to handle real world HTML.  The WHATWG places 
interoperability with existing content above syntactic purity.

> Maybe what I should have offered is three categories:
> 
> - tag soup;
> - HTML5 with *no* parse errors;
> - SGML based.
> 
> I'm assuming that HTML5 *with* parse errors can produce all the 
> productions allowed by tag soup.  In a quick scan, I couldn't tell what, 
> if any difference there is between HTML5 without parse errors and SGML 
> based.  In any case, it seems to be quite close to SGML based.

There are pleny of differences between HTML5 parsing and HTML4's SGML 
parsing.  Here's just a few:

In HTML5, <br/> is conforming and equivalent to <br>
In HTML4, <br/> is equivalent to <br>&gt;

In HTML5, characters like '/' can occur in unquoted attribute values
e.g. <a href=http://example.com/>link</a>

In HTML4, that would be equivalent to:
   <a href="http:"></a>example.com/&gt;link</a>

In HTML5, <noscript> is parsed differently depending on whether or not 
script is enabled.

There's plenty more differences, but that should be enough to illustrate 
the point.

-- 
Lachlan Hunt
http://lachy.id.au/

Received on Saturday, 21 April 2007 09:55:40 UTC