Precision and error handling (was URL work in HTML 5) from Larry Masinter on 2012-09-28 (www-tag@w3.org from September 2012)

From: Larry Masinter <masinter@adobe.com>
Date: Thu, 27 Sep 2012 21:06:19 -0700
To: Robin Berjon <robin@w3.org>, Noah Mendelsohn <nrm@arcanedomain.com>
CC: W3C TAG <www-tag@w3.org>
Message-ID: <C68CB012D9182D408CED7B884F441D4D1E2E25FEBA@nambxv01a.corp.adobe.com>
I want to get back to the issue as Robin raised it.

This is a topic that I've tried to write at various times, e.g., http://masinter.blogspot.com/2010/01/over-specification-is-anti-competitive.html

http://masinter.blogspot.com/2011/06/irreconcilable-differences.html; http://masinter.blogspot.com/2010/03/browsers-are-rails-web-sites-are-trains.html.


I'm still looking for Jonathan's writing about the topic, if we want to spend F2F time on this.


I think Robin's "three possible approaches", while a common perspective, is rooted in "false trichotomy" in the choice between "defined behavior", "Undefined", and "catching fire".
All of the behavior is described, one way or another, with various degrees of specificity. Nothing is described precisely enough to give bit-for-bit reproducible behavior, and trying to do so is frankly impossible, given the dynamic and asynchronous behavior of the web environment and the constraints placed on it by security and privacy concerns.  

The standards process involves a classic "prisoner's dilemma": if everyone cooperates, good things can happen, but even just one rogue participant, acting for individual gain, can grab more for themselves, by attempting to be "friendlier". To gather more consistency and robustness of the web requires ALL of the implementors to agree to do something which might seem "not friendly". Do not sniff, do not track, do not over-compensate for user spelling mistakes by quietly DWIM-ing misspelled <htmlll> <bodddy> <hh1><pp>  as if the user had typed <html> <body><h1><p>. To do so would introduce chaos. It might be "friendly", and if you were the "dominant browser", might even seem like a way of cementing your dominance.

Avoiding escalation of DWIM-ish features involves convincing ALL of the major players to reject (ignore, not process, treat as error, fail to retrieve, fail to treat as equivalent) things that would otherwise be friendly to accept. That would then allow the otherwise unruly content community to learn to create more conservative content.

Getting all the players to agree requires leadership, and a clear vision of robustness objectives.

Larry

============
From: Robin Berjon [mailto:robin@w3.org]
Sent: Wednesday, September 26, 2012 4:08 AM
....


To simplify, you essentially have three possible approaches (for all of 
which one can find many existing examples).

A) Define the behaviour of conforming content, leave the behaviour of 
erroneous content undefined.

B) Define the behaviour of conforming content, catch fire on everything 
else.

C) Define the behaviour of all content, non-conforming content is 
defined and can be flagged by quality tools (such as validators).

These three approaches produce different incentives in a technological 
ecosystem involving multiple implementations distributed over a large 
scale in both space and time.

With case (A), it is likely that there will be implementations that will 
have defined behaviour for erroneous content (whether intentionally or 
through bugs does not matter). People will end up relying on that 
working, and implementations will need to copy one another's bugs. The 
standard will need to catch up (painfully). Since you can't test for 
undefined behaviour, there is nothing you can do to prevent this drift.

With case (B) you can test that processors do indeed catch fire and so 
can prevent the drift (this has been overall very successfully shown 
with XML). But in a user-facing system, catching fire is hardly the 
friendliest thing to do — especially if the content is composed from 
multiple sources (that may be highly combinatorial) outside the reach of 
the primary content producer.

Case (C) is essentially case (B) but with well-defined behaviour that is 
richer and more elaborate than just blowing up. It assumes that errors 
will happen (in that it is somewhat reminiscent of the design decisions 
made in Erlang as contrasted with other languages) and that end-users 
should not be ones dealing with them. This can be thoroughly tested for 
such that as in (B) it is possible to assert conformance of a processor 
for all content and therefore avoid the drift typical in (A). It also 
provides a solid foundation for a versioning strategy.

None of these approaches guarantees quality, but (B) and (C) guarantee 
interoperability, and (C) guarantees user-friendliness.

You seem to believe that the approach taken in HTML and other such 
specifications is to prolong the mess generated by A-type standards — in 
fact it is the exact opposite. Once the mess left by the unavoidable 
drift in A-type standards is properly accounted for and grandfathered, 
technology development can proceed sanely. The vast increase in 
HTML-based innovation over the past few years is a testimony to this.

This could be seen as involving a reformulation of Postel: "Be universal 
and well-defined in what you accept; don't be daft and quality-check 
what you produce."

Yeah, it doesn't read quite so well. But its ecosystemic properties are 
actually far more robust over time.

-- 
Robin Berjon - http://berjon.com/ - @robinberjon
Received on Friday, 28 September 2012 04:06:53 UTC