- From: Ian Hickson <ian@hixie.ch>
- Date: Mon, 17 Nov 2008 07:07:26 +0000 (UTC)
- To: Elliotte Harold <elharo@metalab.unc.edu>
- Cc: "Henry S. Thompson" <ht@inf.ed.ac.uk>, noah_mendelsohn@us.ibm.com, public-html <public-html@w3.org>, www-tag@w3.org
On Sun, 16 Nov 2008, Elliotte Harold wrote: > > Error correction is much more problematic. > > In essence, the path taken by HTML 5 is that there is no such thing as a > document which is in error. All byte streams become legal HTML > documents. That's not how they phrase it, but that's the effect. How is this different from what HTML4 did? HTML4 said "this is what is valid, and everything else should work too". And the browsers by and large did this, in an interoperable fashion (at great cost, and in a manner that made it very hard to enter the market). How does this differ from HTML5's approach, other than HTML5 making competition easier? > it very much raises the bar for implementing parsers This is demonstrably false, in that there are more interoperable HTML5 parsers today, before the spec is even finished, than there have ever been interoperable HTML4 parsers. Even for valid documents of each. > and is contrary to the design of XML at a very deep level. In essence, > it is a fundamental rejection of one of the core values of XML. It is > the polar opposite of draconian error handling. Absolutely. XML's approach has utterly failed on the Web (q.v. the universal feed parser for RSS and Atom). It would be amateurish of us to keep following this model after what we have learnt over the past ten years. We have a responsibility to the Web to do better. Also, I think it's pushing the truth a bit to say that draconian error handling is a core value of XML. The XML working group was quite split on the issue. [1] [1] http://www.tbray.org/ongoing/When/200x/2004/01/16/DraconianHistory > It makes the spec far harder to understand and implement. Half of the error handling is almost implicit, in that the algorithm that says what you have to do just handles all cases without needing to be explicit. So that's not harder to understand. The other half might be somewhat more involved than ignoring error cases, but, well, tough. We're not making toast here, we're trying to define one of the most important platforms that humanity has ever used. If it's a little harder to understand, sobeit. It's certainly not harder to implement, either. Implementors have to implement error handling regardless of what the spec says. It's easier to follow a spec than to reverse engineer the market leader to work out what the error handling should be. And this doesn't just apply to browsers -- tools want to behave the same way too, because that way they get more compatibiliy with more pages. For example, a search engine indexer would want to implement the HTML5 parsing algorithm to get interoperability with browsers on how to parse documents. The way that the HTML4 validator didn't do things like browsers has caused confusion for years; Henri's validator follows the browsers (by implementing HTML5) and gets much better results because of it. The spec generator tool that -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 17 November 2008 07:08:09 UTC