Re: URL work in HTML 5 from Robin Berjon on 2012-09-26 (www-tag@w3.org from September 2012)

From: Robin Berjon <robin@w3.org>
Date: Wed, 26 Sep 2012 13:08:26 +0200
To: Noah Mendelsohn <nrm@arcanedomain.com>
CC: W3C TAG <www-tag@w3.org>
Message-ID: <5062E22A.8090804@w3.org>
On 25/09/2012 16:14 , Noah Mendelsohn wrote:
> On 9/25/2012 9:27 AM, Robin Berjon wrote:
>> I believe that the idea is that once the rules that describe
>> processing as it happens are written down, you write test suites
>> that can prove conformance. This does tend to have a strong effect,
>> particularly if coupled with rules about processing erroneous
>> input.
>
> Well, half of the HTML5 spec is devoted to documenting cases where
> individual browsers were liberal, the conformance suites (like the
> W3C validator) were not used by producers, the invalid data on the
> wire became commonplace, and now the specification is complicated by
> the need to support the union of all these deviations.
>
>> Well-defined error handling that produces something predictable
>> (rather than blow up) is actually a modern and more pragmatic
>> reformulation of Postel, IMHO.
>
> My concern is not with well-defined error handling; it's with not
> putting equal emphasis on inducing producers to cleanup their act
> too.

And, precisely, I believe that it is a misunderstanding to interpret the 
way in which the HTML specification (and now many others) as anything 
other than a systematic attempt to prevent producer drift. I further 
believe that the approach taken here is of an architectural nature and 
that it would be well within the TAG's remit to investigate it in 
greater depth.

To simplify, you essentially have three possible approaches (for all of 
which one can find many existing examples).

A) Define the behaviour of conforming content, leave the behaviour of 
erroneous content undefined.

B) Define the behaviour of conforming content, catch fire on everything 
else.

C) Define the behaviour of all content, non-conforming content is 
defined and can be flagged by quality tools (such as validators).

These three approaches produce different incentives in a technological 
ecosystem involving multiple implementations distributed over a large 
scale in both space and time.

With case (A), it is likely that there will be implementations that will 
have defined behaviour for erroneous content (whether intentionally or 
through bugs does not matter). People will end up relying on that 
working, and implementations will need to copy one another's bugs. The 
standard will need to catch up (painfully). Since you can't test for 
undefined behaviour, there is nothing you can do to prevent this drift.

With case (B) you can test that processors do indeed catch fire and so 
can prevent the drift (this has been overall very successfully shown 
with XML). But in a user-facing system, catching fire is hardly the 
friendliest thing to do — especially if the content is composed from 
multiple sources (that may be highly combinatorial) outside the reach of 
the primary content producer.

Case (C) is essentially case (B) but with well-defined behaviour that is 
richer and more elaborate than just blowing up. It assumes that errors 
will happen (in that it is somewhat reminiscent of the design decisions 
made in Erlang as contrasted with other languages) and that end-users 
should not be ones dealing with them. This can be thoroughly tested for 
such that as in (B) it is possible to assert conformance of a processor 
for all content and therefore avoid the drift typical in (A). It also 
provides a solid foundation for a versioning strategy.

None of these approaches guarantees quality, but (B) and (C) guarantee 
interoperability, and (C) guarantees user-friendliness.

You seem to believe that the approach taken in HTML and other such 
specifications is to prolong the mess generated by A-type standards — in 
fact it is the exact opposite. Once the mess left by the unavoidable 
drift in A-type standards is properly accounted for and grandfathered, 
technology development can proceed sanely. The vast increase in 
HTML-based innovation over the past few years is a testimony to this.

This could be seen as involving a reformulation of Postel: "Be universal 
and well-defined in what you accept; don't be daft and quality-check 
what you produce."

Yeah, it doesn't read quite so well. But its ecosystemic properties are 
actually far more robust over time.

-- 
Robin Berjon - http://berjon.com/ - @robinberjon
Received on Wednesday, 26 September 2012 11:08:33 UTC