Re: Comments on HTML WG face to face meetings in France Oct 08 from Ian Hickson on 2008-11-17 (public-html@w3.org from November 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Mon, 17 Nov 2008 22:59:53 +0000 (UTC)
To: Elliotte Harold <elharo@metalab.unc.edu>
Cc: public-html <public-html@w3.org>, www-tag@w3.org
Message-ID: <Pine.LNX.4.62.0811172247010.1237@hixie.dreamhostps.com>
On Mon, 17 Nov 2008, Elliotte Harold wrote:
> Ian Hickson wrote:
> > 
> > How is this different from what HTML4 did? HTML4 said "this is what is 
> > valid, and everything else should work too". And the browsers by and 
> > large did this, in an interoperable fashion (at great cost, and in a 
> > manner that made it very hard to enter the market). How does this 
> > differ from HTML5's approach, other than HTML5 making competition 
> > easier?
> 
> HTML 4 enabled parsers to defined their own error recovery. [HTML 5] 
> requires specific error recovery.

Right, that was one of the big mistakes in HTML4, which we are correcting 
in HTML5. But that doesn't address the point you made, which was:

| In essence, the path taken by HTML 5 is that there is no such thing as a 
| document which is in error. All byte streams become legal HTML 
| documents. That's not how they phrase it, but that's the effect.

Since HTML4 also required that browsers "handle" errors in a non-fatal 
way, usually requiring or suggesting that errors be "ignored", how are 
HTML5's more precise rules for error handling different than HTML4 in 
terms of making all byte streams "legal", as you consider them?


Anyway, this is a moot discussion; defining error handling is one of the 
core principles of the HTML5 effort.


> > > it very much raises the bar for implementing parsers
> > 
> > This is demonstrably false, in that there are more interoperable HTML5 
> > parsers today, before the spec is even finished, than there have ever 
> > been interoperable HTML4 parsers. Even for valid documents of each.
> 
> Greater than zero (or perhaps one--can there be a single interoperable 
> parser?) is not a very high bar to hurdle.

I thought you said it "very much" raises the bar? I'm confused as to what 
you are arguing here.


> > Absolutely. XML's approach has utterly failed on the Web (q.v. the 
> > universal feed parser for RSS and Atom). It would be amateurish of us 
> > to keep following this model after what we have learnt over the past 
> > ten years. We have a responsibility to the Web to do better.
> 
> There are reasons for that, mostly due to mistakes the W3C made in the 
> development of HTML. They pushed a syntax change without compensating 
> features to make the syntax changes worthwhile to implementers and 
> users. HTML 5 makes the opposite mistake: it's only pushing features 
> with no syntax changes. This seems likely to cause other problems.

Could you provide some examples? I really don't follow your point here.

What are you suggesting should change in the spec or our process, and why?


> > Also, I think it's pushing the truth a bit to say that draconian error 
> > handling is a core value of XML. The XML working group was quite split 
> > on the issue. [1]
> 
> They were split but draconian error handling won.

Yup, but that doesn't mean it's a core value.

On the other hand, defining precise graceful error handling _is_ a core 
value of HTML5. It's one of our fundamental principles, laid out years 
before the W3C HTML working group began work.


> > > It makes the spec far harder to understand and implement.
> > 
> > Half of the error handling is almost implicit, in that the algorithm 
> > that says what you have to do just handles all cases without needing 
> > to be explicit. So that's not harder to understand. The other half 
> > might be somewhat more involved than ignoring error cases, but, well, 
> > tough. We're not making toast here, we're trying to define one of the 
> > most important platforms that humanity has ever used. If it's a little 
> > harder to understand, sobeit.
> 
> Straw man. I am not suggesting that one ignore error cases.

You were suggesting that having the spec define how to handle errors made 
the spec was harder to understand and implement. I am explaining why this 
is demonstrably not the case.


> I am simply suggesting that one might wish to report them and indicate 
> them as such, rather than defining them out of existence.

Then you'll be glad to know that HTML5 calls out exactly what is an error. 
Indeed, the entire parsing algorithm, for example, is littered with 
statemnets like "this is a parse error" that allows for parsers to very 
precisely report when a parse error occurs. (It's even easier to implement 
this for HTML5 than to do it for XML! No thought required, just a direct 
translation of the prose into code.)

HTML5 doesn't define errors out of existence. In fact, it goes to quite 
some lengths to define what is an error.


> HTML 5 error handling is much harder to implement than draconian error 
> handling that refuses to parse or display malformed documents.

On the contrary, it is no harder. Indeed for some errors it is easier.

For example, consider a point in the syntax where only characters A-Z are 
legal, and all other characters are illegal. To catch the error, you have 
to check that the characters match the range A-Z. To not catch the error, 
you don't have to do anything, you just treat all characters as 
equivalently valid. This is one example of how catching errors can be 
harder than not catching them.

Implementors of the HTML5 algorithm have already pointed out how HTML5 
parsers are on the same order of complexity as XML parsers.


> Is the additional difficulty worth it?

Yes. Absolutely.


> Is the HTML 5 spec actually clear and unambiguous enough to achieve that 
> goal?

It seems so.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 17 November 2008 23:00:52 UTC