Re: Precision and error handling (was URL work in HTML 5) from Robin Berjon on 2012-10-01 (www-tag@w3.org from October 2012)

From: Robin Berjon <robin@w3.org>
Date: Mon, 01 Oct 2012 13:37:33 +0200
To: Larry Masinter <masinter@adobe.com>
CC: Noah Mendelsohn <nrm@arcanedomain.com>, W3C TAG <www-tag@w3.org>
Message-ID: <5069807D.9090003@w3.org>
Hi Larry,

I don't have the time to write up this topic in the full breadth of 
detail that it requires, but I will try to address what points I can 
quickly.

On 28/09/2012 06:06 , Larry Masinter wrote:
> I think Robin's "three possible approaches", while a common
> perspective, is rooted in "false trichotomy" in the choice between
> "defined behavior", "Undefined", and "catching fire". All of the
> behavior is described, one way or another, with various degrees of
> specificity. Nothing is described precisely enough to give
> bit-for-bit reproducible behavior, and trying to do so is frankly
> impossible, given the dynamic and asynchronous behavior of the web
> environment and the constraints placed on it by security and privacy
> concerns.

I don't think you can get away with calling it a false trichotomy 
without showing that these cases are, in fact, equivalent.

As an editor, if confronted with the possibility that an author may 
place a <p> inside another <p>, I can say:

A) Nothing, or explicitly that I have no clue what happens.
B) That nothing is produced, that the DOM stops there or that there is 
no DOM.
C) That the currently opened <p> is automatically closed (or other 
well-defined handling).

Each of these choices will have different impacts on the ecosystem, and 
their influence will be felt more the longer and the more broadly the 
concerned technology is deployed. Implementers will not behave in the 
same manner for each. This will cause authors to behave differently. 
Users won't see the same results.

I am unsure about what you mean by the impossibility of reproducible 
behaviour due to dynamic, asynchronous, security, or privacy 
constraints. Can you cite how any such constraints may for instance 
render the HTML parsing algorithm impossible?

> The standards process involves a classic "prisoner's dilemma": if
> everyone cooperates, good things can happen, but even just one rogue
> participant, acting for individual gain, can grab more for
> themselves, by attempting to be "friendlier". To gather more
> consistency and robustness of the web requires ALL of the
> implementors to agree to do something which might seem "not
> friendly". Do not sniff, do not track, do not over-compensate for
> user spelling mistakes by quietly DWIM-ing misspelled <htmlll>
> <bodddy> <hh1><pp>  as if the user had typed <html> <body><h1><p>. To
> do so would introduce chaos. It might be "friendly", and if you were
> the "dominant browser", might even seem like a way of cementing your
> dominance.
>
> Avoiding escalation of DWIM-ish features involves convincing ALL of
> the major players to reject (ignore, not process, treat as error,
> fail to retrieve, fail to treat as equivalent) things that would
> otherwise be friendly to accept. That would then allow the otherwise
> unruly content community to learn to create more conservative
> content.

I think that you are conflating many things here. Most importantly, 
having a well-defined output for any given input is not DWIM. It's 
simply reducing variability in standards, which is a good practice. 
Undefined behaviour on error introduces discretionary items in 
implementation behaviour.

See for instance http://www.w3.org/TR/spec-variability/#optionality (and 
many other parts of the QA framework).

In general, DWIM is orthogonal to clear and concise specification. You 
can have a well-defined and non-drifting specification that describes a 
DWIM technology (examples: HTML5, ECMA-262), likewise for 
implementations (example: Perl); and conversely you can have non-DWIM 
(traditionally called "B&D", but that's by DWIM partisans of course) 
specification that nevertheless has gaps and loopholes that render 
interoperability hard despite even implementer goodwill (example: XML 
Schema 1.0).

I believe that you're confusing DWIM — a design philosophy in which 
useful defaults and well-defined handling of input combine to provide 
developers with a highly productive environment — with the sort of 
too-smart-for-its-own-pants behaviour that some applications (e.g. MS 
Word) tend to have.

DWIM is a core feature in many if not most successful Web technologies. 
I'd think twice before badmouthing it. If you want to take DWIM off the 
Web you'll have to pry jQuery from the impervious grasp of my cold, dead 
fingers.

But in any case, that's orthogonal to the drifting behaviour you 
describe. Specifications and tests are the tools we have to contain such 
drift. Introducing variability in specification by deliberately leaving 
behaviour in the face of erroneous content undefined is a great way of 
making sure that drift will happen.

I think it takes some gall to blame implementers for wanting to be 
"friendly" to their users. Of course they should be friendly to their 
users! That is and should be their number one priority and incentive. It 
is our job as specification makers to ensure that this friendliness does 
not introduce interoperability issues but is rather fully accounted for 
as an essential component in a complex system.

> Getting all the players to agree requires leadership, and a clear
> vision of robustness objectives.

I am always highly suspicious of social mechanisms that require 
leadership and clear vision to function. It seems like an inherently 
broken design to boot — why introduce reliance on something known to be 
so brittle, especially over the long term?

-- 
Robin Berjon - http://berjon.com/ - @robinberjon
Received on Monday, 1 October 2012 11:37:47 UTC