Re: Support Existing Content from Jonas Sicking on 2007-05-05 (public-html@w3.org from May 2007)

From: Jonas Sicking <jonas@sicking.cc>
Date: Fri, 04 May 2007 21:17:19 -0700
To: Gareth Hay <gazhay@gmail.com>
CC: Maciej Stachowiak <mjs@apple.com>, matt@builtfromsource.com, public-html@w3.org
Message-ID: <463C054F.1090002@sicking.cc>
Gareth Hay wrote:
> 
> 
> On 3 May 2007, at 22:24, Jonas Sicking wrote:
>> These are the arguments against "draconian errorhandling" that I can see:
>> 2.
>> It's hard for authors to get things perfect. Writing bug free code has 
>> nothing to do with being lazy or uninformed. When did you ever run 
>> into a bugfree software program? If you want to generate something 
>> with as strict parsing rules as that you probably want to write code 
>> that provably creates good output. The only way I can think of to do 
>> that would be to let servers generate DOM-like data structures that 
>> then gets serialized before sent over the wire.
>> While this sounds like a good design to me, it would be a big change 
>> from how servers work today and would significantly raise the bar for 
>> adopting HTML5 for authors.
>>
> This issue isn't about bug free code. I think you would concede that 
> even the buggy code has compiled? Just because the logic that has been 
> programmed according to the syntax rules is flawed doesn't mean it's the 
> compiler's fault.

Certain parts of the code has to be bug free, i.e. the parts that 
produces the HTML5 serialization. Yes, there can be other bugs in the 
code, but a large set of bugs would simply not be acceptable.

>> 3.
>> The "cleanup" of the web it would accomplish is actually fairly small. 
>> Most quirks and inconsistencies is in how things behave after they 
>> have been parsed. The biggest one is in how things are rendered, but 
>> also in how the DOM behaves.
>> And while there is some value for UA developers since they'd have an 
>> easier time writing the parser, I see little to no value for web 
>> authors over having relaxed, but consistent, error handling in the 
>> various browsers.
>>
> I completely disagree. Though it won't happen overnight, this approach 
> would educate authors to write better code, and after time the tag-soup 
> would begin to become cleaner.

So even if you fixed the problem that is tagsoup parsing, you would 
still leave the main hurdle, i.e. rendering and the DOM.

Or do you not agree that rendering and DOM are quirky?

>> The result is that the price you pay for such strict error handling (1 
>> and 2) is very high, while the value you get (3) is pretty small.
>>
> In your opinion.

It's not so much my opinion as my experience from being a browser 
developer for the last 7 years. The code to deal with DOM0 and the code 
to deal with rendering is a lot hairier than the code to deal with 
parsing. At least in gecko. Any other browser developers out there have 
different experience?

> I was thinking about this issue overnight, and I think I need some 
> clarification.
> Is it not correct that each browser currently handles errors in their 
> own manner?
> People on here are aiming to document this inconsistent error handling 
> to base the spec on.
> A common ground will be found and this will be the specified behaviour 
> for the future.
> 
> So If this is correct then I don't understand, some UA's will have to 
> change their error handling, breaking the web as much as "draconian" 
> error handling.
> Ok, so they will be changing to a consistent handling, but any change at 
> all will lead to as much disruption as what is being suggested?

The idea is to make the error handling specified by the spec such that 
if you feed todays web content to a HTML5 parser, you'll get something 
that is close enough to what browsers to today that very few pages would 
break.

One thing that I think is important to point out is this:

I do agree that if we had draconian error handling that would eventually 
produce cleaner markup. Content written in languages that have draconian 
error handling, such as XML and C, have much fewer errors than content 
written in todays HTML and the HTML5 suggested by the current draft. And 
such content would probably work better across multiple browsers.

However, another effect of draconian error handling is that a lot fewer 
people are able to produce content in the language. There are much fewer 
people in the world that write XML and C than there are people that 
write HTML. One of the reasons for the success of the internet is the 
simplicity of producing HTML content.
Javascript was designed with exactly this issue in mind, it should be 
easy to produce content for. You can also note that javascript has much 
less draconian error handling than C and that there are a lot more 
authors of javascript code than C code.

I don't think the english example that has been brought up elsewhere is 
a bad example at all. If we demanded that english was spoken with 
perfect grammar, there would be a lot fewer people producing english 
content (i.e. speaking english).

While some people would learn how to write proper HTML5 with draconian 
error handling, a lot of people would simply give up and we'd have much 
fewer people producing content for the web. And IMHO the strength of the 
web is not the fact that you can make flashy pages will nice CSS layout. 
It's that there's a lot of people producing a lot of content for it.

So while I agree there are advantages with draconian error handling, I 
think the disadvantage is much much greater.

Hope that helps you understand my point of view.

Best Regards,
/ Jonas
Received on Saturday, 5 May 2007 04:19:52 UTC