Re: Comments on HTML WG face to face meetings in France Oct 08 from Ian Hickson on 2008-11-17 (public-html@w3.org from November 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 17 Nov 2008 07:07:26 +0000 (UTC)
To: Elliotte Harold <elharo@metalab.unc.edu>
Cc: "Henry S. Thompson" <ht@inf.ed.ac.uk>, noah_mendelsohn@us.ibm.com, public-html <public-html@w3.org>, www-tag@w3.org
Message-ID: <Pine.LNX.4.62.0811170613070.1041@hixie.dreamhostps.com>

On Sun, 16 Nov 2008, Elliotte Harold wrote:
> 
> Error correction is much more problematic.
>
> In essence, the path taken by HTML 5 is that there is no such thing as a 
> document which is in error. All byte streams become legal HTML 
> documents. That's not how they phrase it, but that's the effect.

How is this different from what HTML4 did? HTML4 said "this is what is 
valid, and everything else should work too". And the browsers by and large 
did this, in an interoperable fashion (at great cost, and in a manner that 
made it very hard to enter the market). How does this differ from HTML5's 
approach, other than HTML5 making competition easier?

> it very much raises the bar for implementing parsers

This is demonstrably false, in that there are more interoperable HTML5 
parsers today, before the spec is even finished, than there have ever been 
interoperable HTML4 parsers. Even for valid documents of each.

> and is contrary to the design of XML at a very deep level. In essence, 
> it is a fundamental rejection of one of the core values of XML. It is 
> the polar opposite of draconian error handling.

Absolutely. XML's approach has utterly failed on the Web (q.v. the 
universal feed parser for RSS and Atom). It would be amateurish of us to 
keep following this model after what we have learnt over the past ten 
years. We have a responsibility to the Web to do better.

Also, I think it's pushing the truth a bit to say that draconian error 
handling is a core value of XML. The XML working group was quite split on 
the issue. [1]

[1] http://www.tbray.org/ongoing/When/200x/2004/01/16/DraconianHistory

> It makes the spec far harder to understand and implement.

Half of the error handling is almost implicit, in that the algorithm that 
says what you have to do just handles all cases without needing to be 
explicit. So that's not harder to understand. The other half might be 
somewhat more involved than ignoring error cases, but, well, tough. We're 
not making toast here, we're trying to define one of the most important 
platforms that humanity has ever used. If it's a little harder to 
understand, sobeit.

It's certainly not harder to implement, either. Implementors have to 
implement error handling regardless of what the spec says. It's easier to 
follow a spec than to reverse engineer the market leader to work out what 
the error handling should be. And this doesn't just apply to browsers -- 
tools want to behave the same way too, because that way they get more 
compatibiliy with more pages. For example, a search engine indexer would 
want to implement the HTML5 parsing algorithm to get interoperability with 
browsers on how to parse documents. The way that the HTML4 validator 
didn't do things like browsers has caused confusion for years; Henri's 
validator follows the browsers (by implementing HTML5) and gets much 
better results because of it. The spec generator tool that 

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Monday, 17 November 2008 07:08:09 UTC