Re: Error handling in URIs from Ian Hickson on 2008-06-26 (uri@w3.org from June 2008)

From: Ian Hickson <ian@hixie.ch>
Date: Thu, 26 Jun 2008 10:33:20 +0000 (UTC)
To: Charles Lindsey <chl@clerew.man.ac.uk>
Cc: URI <uri@w3.org>
Message-ID: <Pine.LNX.4.62.0806261026190.13974@hixie.dreamhostps.com>
On Thu, 26 Jun 2008, Charles Lindsey wrote:
> > 
> > That's one option, though it's not the way we've done things in HTML5 
> > so far (for example we define how to parse any arbitrary byte stream).
> 
> And that is exactly where you make your great mistake. That attitude is 
> the exact cause of the mess we are currently in, where websites have to 
> declare that "This site is designed to be read by IE", or else they have 
> to include tests for the browser that is reading them and to modify 
> their behaviour accordingly. Which means that they probably do not work 
> at all for browsers they have never heard of (and particularly those 
> which implement exactly what the standards say).

Actually you have this exactly backwards -- it's (in part) the lack of 
defined error handling that has led to the current interoperability 
nightmare. Areas where error handling is defined end up generally with 
much, much better interoperability. (Comprehensive test suites are a big 
help as well.)

The whole point of defining error handling is that all browsers, including 
those the author hasn't heard of, will behave the same, regardless of 
whether the author does the right thing or not.

(The attitude you refer to can't possibly be the cause of the mess we're 
in, since the mess predates this attitude in the Web standards world.)


> It is the same mistake made by the designers of PL/1 where they tried to 
> invent a new "feature" to provide a meaning for every construct that 
> should have been syntactically disallowed, so that compilers failed to 
> spot obvious programming errors and, instead, produced (correct but) 
> entirely improbable behaviours (the "Law off Greatest Astonishment").

HTML5 doesn't make everything allowed. Indeed, it disallows far more than 
HTML4 did. Validators can and should still be used and indeed one 
validator implementor is actively contributing feedback including 
statistics about what kinds of errors are most common and what kinds of 
things he thinks should be caught and aren't, which has led to the 
language being improved in ways we hadn't previously considered.

The parsing rules I mentioned earlier in fact specify all the cases that 
are parse errors, for example. (Search for "parse error" in the spec.)


> > > But in the meantime, a sensible strategy for a browser whose pages 
> > > were published in iso-8859-99 (whatever that might be) to accept 
> > > IRIs/URIs (and especially queries) %-encoded into iso-8859-99; but 
> > > also, *in addition* to convert incoming UTF-8 (whether in IRIs or 
> > > %-encoded in URIs) to its own iso-8859-99.
> > 
> > Well, as noted before, the actual behaviour we need to spec isn't 
> > really up for debate; browsers have already more or less converged on 
> > a behaviour. The original question (now answered) was merely which 
> > spec would define this. (HTML5 now defines it.)
> 
> But if the current actual behaviours do not actually work, then it is 
> far better for your document to specify new (or additional) behaviours 
> that would in fact work better.

I don't understand what you mean by "work". There is a lot of content on 
the Web that requires the current behaviour to render as the author 
intended. It all seems to "work", even if it's not theoretically ideal.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 26 June 2008 10:33:58 UTC