Re: Precision and error handling (was URL work in HTML 5) from Martin J. Dürst on 2012-10-02 (www-tag@w3.org from October 2012)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Tue, 02 Oct 2012 16:37:58 +0900
To: Robin Berjon <robin@w3.org>
CC: Larry Masinter <masinter@adobe.com>, Noah Mendelsohn <nrm@arcanedomain.com>, W3C TAG <www-tag@w3.org>
Message-ID: <506A99D6.6000908@it.aoyama.ac.jp>
Hello Robin,

On 2012/10/01 20:37, Robin Berjon wrote:
> Hi Larry,
>
> I don't have the time to write up this topic in the full breadth of
> detail that it requires, but I will try to address what points I can
> quickly.
>
> On 28/09/2012 06:06 , Larry Masinter wrote:
>> I think Robin's "three possible approaches", while a common
>> perspective, is rooted in "false trichotomy" in the choice between
>> "defined behavior", "Undefined", and "catching fire". All of the
>> behavior is described, one way or another, with various degrees of
>> specificity. Nothing is described precisely enough to give
>> bit-for-bit reproducible behavior, and trying to do so is frankly
>> impossible, given the dynamic and asynchronous behavior of the web
>> environment and the constraints placed on it by security and privacy
>> concerns.
>
> I don't think you can get away with calling it a false trichotomy
> without showing that these cases are, in fact, equivalent.

I think Larry isn't arguing that these cases are equivalent, just that 
all three almost invariably tend to turn up in a spec.


> As an editor, if confronted with the possibility that an author may
> place a <p> inside another <p>, I can say:
>
> A) Nothing, or explicitly that I have no clue what happens.
> B) That nothing is produced, that the DOM stops there or that there is
> no DOM.
> C) That the currently opened <p> is automatically closed (or other
> well-defined handling).

C) is what HTML4 already prescribes (by allowing to omit </p> in the 
DTD). I therefore personally don't see any big problem to keep that 
behavior for (non-XML) HTML5. I think there are cases where HTML5 goes 
much further, and they would probably make better examples in this 
discussion.


> Each of these choices will have different impacts on the ecosystem, and
> their influence will be felt more the longer and the more broadly the
> concerned technology is deployed. Implementers will not behave in the
> same manner for each. This will cause authors to behave differently.
> Users won't see the same results.
>
> I am unsure about what you mean by the impossibility of reproducible
> behaviour due to dynamic, asynchronous, security, or privacy
> constraints. Can you cite how any such constraints may for instance
> render the HTML parsing algorithm impossible?

I think Larry was speaking in general. I don't know the details of the 
HTML parsing algorithm, but there may be aspects of it that are 
timing-dependent, and in that case, dynamic and/or asynchronous issues 
may turn up.


>> The standards process involves a classic "prisoner's dilemma": if
>> everyone cooperates, good things can happen, but even just one rogue
>> participant, acting for individual gain, can grab more for
>> themselves, by attempting to be "friendlier". To gather more
>> consistency and robustness of the web requires ALL of the
>> implementors to agree to do something which might seem "not
>> friendly". Do not sniff, do not track, do not over-compensate for
>> user spelling mistakes by quietly DWIM-ing misspelled <htmlll>
>> <bodddy> <hh1><pp> as if the user had typed <html> <body><h1><p>. To
>> do so would introduce chaos. It might be "friendly", and if you were
>> the "dominant browser", might even seem like a way of cementing your
>> dominance.
>>
>> Avoiding escalation of DWIM-ish features involves convincing ALL of
>> the major players to reject (ignore, not process, treat as error,
>> fail to retrieve, fail to treat as equivalent) things that would
>> otherwise be friendly to accept. That would then allow the otherwise
>> unruly content community to learn to create more conservative
>> content.
>
> I think that you are conflating many things here. Most importantly,
> having a well-defined output for any given input is not DWIM. It's
> simply reducing variability in standards, which is a good practice.
> Undefined behaviour on error introduces discretionary items in
> implementation behaviour.
>
> See for instance http://www.w3.org/TR/spec-variability/#optionality (and
> many other parts of the QA framework).
>
> In general, DWIM is orthogonal to clear and concise specification. You
> can have a well-defined and non-drifting specification that describes a
> DWIM technology (examples: HTML5, ECMA-262), likewise for
> implementations (example: Perl);

I think first, we should agree on some terminology. Your use of DWIM 
seems to me not exactly in line with e.g. what Wikipedia has to say.

As an example, I don't think that ECMAScript qualifies as DWIM in any 
sense, and neither does Perl (nor e.g. Ruby). DWIM (do what I mean) goes 
quite a bit further than just creating a programming language that is, 
by the judgment of those who frequently use it, intuitive and easy to 
use. All these three languages have very clear syntax rules and 
function/method names with only very few aliases, if at all.
A DWIM Perl was proposed (as a joke, mostly) by one of the greatest Perl 
hackers, Damian Convay (see 
http://search.cpan.org/~dconway/Acme-Bleach-1.150/lib/Acme/DWIM.pm).

These languages (to greater or lesser extend, depending on whom you ask) 
are quite intuitive to use. But it takes somebody like Larry Wall or 
Yukihiro Matsumoto to come up with a language that has a reasonably 
clear grammar and is intuitive (in some sense) to use.

If you compare languages like ECMAScript, Perl, or Ruby with the 
definition of DWIM as e.g. given on the respective Wikipedia page 
(http://en.wikipedia.org/wiki/DWIM), there are very clear differences. 
By chance, Larry himself is cited on that page, as follows:

 >>>>
Teitelman and his Xerox PARC colleague Larry Masinter later described 
the philosophy of DWIM in the Interlisp programming environment (the 
successor of BBN Lisp):

 >>>>
Although most users think of DWIM as a single identifiable package, it 
embodies a pervasive philosophy of user interface design: at the user 
interface level, system facilities should make reasonable 
interpretations when given unrecognized input. ...the style of interface 
used throughout Interlisp allows the user to omit various parameters and 
have these default to reasonable values...
  DWIM is an embodiment of the idea that the user is interacting with an 
agent who attempts to interpret the user's request from contextual 
information. Since we want the user to feel that he is conversing with 
the system, he should not be stopped and forced to correct himself or 
give additional information in situations where the correction or 
information is obvious.
 >>>>
 >>>>

The Wikipedia page also compares DKIM to a spell-checker. If you think 
that page is outdated, can you provide a better reference?


> and conversely you can have non-DWIM
> (traditionally called "B&D", but that's by DWIM partisans of course)
> specification that nevertheless has gaps and loopholes that render
> interoperability hard despite even implementer goodwill (example: XML
> Schema 1.0).
>
> I believe that you're confusing DWIM — a design philosophy in which
> useful defaults and well-defined handling of input combine to provide
> developers with a highly productive environment —

Getting back to HTML5, is there anybody except maybe the spec editor(s) 
themselves who knows all the parsing rules by heart and therefore can 
productively make use of them? What would you give as an advice to 
somebody who wants to "productively" edit HTML5 source?

a) Just write what you think might work, HTML will do what you mean (DWYM).
b) Follow a few very simple rules (mostly just XML-like syntax and 
structure), and easily stay on the safe side and don't go near the 
cliff, and frequently use a validator or similar checking tool.
c) Something else (if you have a better idea, I'd like to hear about it).


> with the sort of
> too-smart-for-its-own-pants behaviour that some applications (e.g. MS
> Word) tend to have.

At the borderline between accepted and erroneous behavior, my guess is 
that users will have the same too-smart-for-its-own-pants impression of 
HTML5. The advantage of HTML5 when compared to MS Word may be that users 
don't have to go there if they don't want to.


> DWIM is a core feature in many if not most successful Web technologies.
> I'd think twice before badmouthing it. If you want to take DWIM off the
> Web you'll have to pry jQuery from the impervious grasp of my cold, dead
> fingers.

Careful syntax and API design is definitely an important part of any 
successful technology, not only the Web. But I don't think that jQuery 
is DWIM in the sense of Wikipedia. So you don't have to be afraid :-).


> But in any case, that's orthogonal to the drifting behaviour you
> describe. Specifications and tests are the tools we have to contain such
> drift. Introducing variability in specification by deliberately leaving
> behaviour in the face of erroneous content undefined is a great way of
> making sure that drift will happen.
>
> I think it takes some gall to blame implementers for wanting to be
> "friendly" to their users. Of course they should be friendly to their
> users! That is and should be their number one priority and incentive. It
> is our job as specification makers to ensure that this friendliness does
> not introduce interoperability issues but is rather fully accounted for
> as an essential component in a complex system.

I see two main questions here:

First, will short-term friendliness produce long-term friendliness. In 
some cases, yes, but in other cases, no.

Second, the question is how friendly the (currently main) HTML5 spec is 
to the users. See Noah's point about making the author spec the main 
HTML5 spec.

Regards,   Martin.


>> Getting all the players to agree requires leadership, and a clear
>> vision of robustness objectives.
>
> I am always highly suspicious of social mechanisms that require
> leadership and clear vision to function. It seems like an inherently
> broken design to boot — why introduce reliance on something known to be
> so brittle, especially over the long term?
>
Received on Tuesday, 2 October 2012 07:38:42 UTC