On telling end users "the input is broken" [was: comments on draft-barth-mime-sniffing] from Michael(tm) Smith on 2009-06-17 (public-html@w3.org from June 2009)

From: Michael(tm) Smith <mike@w3.org>
Date: Wed, 17 Jun 2009 20:03:01 +0900
To: Shane McCarron <shane@aptest.com>
Cc: public-html@w3.org
Message-ID: <20090617110258.GA10019@sideshowbarker>
[cc trimmed]

Shane McCarron <shane@aptest.com>, 2009-06-17 00:01 -0500:

>  Most modern standards and "recommendations" define what happens 
>  when presented with conforming input, and implicitly or explicitly state 
>  that non-conforming input results unspecified or undefined behavior
>  (see *all* the IEEE POSIX standards, for example).

The unique scale and scope of the problems we need to address in
developing rules for optimal-for-the-needs-of-actual-end-users
processing of resources on the public Web may be necessarily very
different than those of most other standards. So I think it is
maybe to be expected that the standards we develop in this group
might need to be somewhat different from most other standards.

But that said, in my previous experience with development of
e-mail software and services for mobile operators and
mobile-device makers and ISPs, I can think of a number of cases
where we had to add ad-hoc error-handling behavior to our
applications, downstream, to deal with processing of cases of
non-conforming input being generated by third-party upstream
systems over which we had no control -- cases for which whatever
relevant standard left the behavior unspecified and undefined.

And I'm certain that other vendors who had downstream apps that
needed to process that same broken input also had to guess out and
add their own ad-hoc error-handling behavior, very likely in a way
that was not fully interoperable with what we ended up deploying.

So I guess I'm not convinced that other standards for handling
Internet content shouldn't also be specced out with a degree of
rigor and thoroughness in defining (and iteratively updating
during further development and testing) rules for error-handling
similar to what we have ended up with in the current HTML5 draft
(and other drafts we have now that are related or spun-off from it).

>  It is an unparalleled act of hubris that this work attempts not
>  only to define good behavior, but also to define *positive*
>  behavior in the face of bad data.  That's just insane.  If it
>  is critical to define the behavior of implementations when
>  handed invalid input, then say that the implementation is
>  required to tell the user "the input is broken"!

In many cases, doing that would seem to amount to punishing end
users for mistakes made by authors.

Or maybe the producer of the input which has the errors may be a
broken upstream implementation (or content provider or author)
over which the users of the downstream consumer apps receiving the
data have zero control.

For that case, having the client/consumer application simply tell
the end user "the input is broken" and refuse to process it does
not seem like right solution to the problem.

Deploying to any significant number of users a client that behaved
that way for that case would seem to incur costs to respond to
customer-support inquiries from those end users -- and maybe a
great way to hand competing vendors an opportunity to take
business away from you (by providing clients that are capable of
processing the input in spite of the errors).

On that note, there's what seems to me a relevant bit in this
essay from back 2004:

  http://dbaron.org/www/df-frag

  ...While solution by the market may not sound inherently bad, it
  is worth remembering that the rules for error-handling in
  traditional HTML were solved by the market, and the end result
  was bad for competition and bad for small devices...

In practice, it seems like for a lot of cases we are left with the
same kind of choice: When there is any market value for a client
that can correct a particular type of error in some class of
input, we can all collectively either:

  A. Follow (repeat) the strategy of waiting and watching as the
     whims of the market (which in some cases largely just amounts
     to whatever vendor has a majority market share) determines
     for us what the error-handling rules are going to end up being.

  B. Work together to consider the possible error cases up front
     and try to get agreement on what the error-handling behavior
     for those cases should be, and spec it out. And to iterate
     through that process as we new kinds of error cases develop
     (or are discovered).

>  Defining that an implementation will somehow cleverly deal with
>  every broken piece of input is not just silly, it's impossible.

In practice, it seems like it's often not a situation of trying to
anticipate and define how to deal with every possible piece of
broken input, but instead a situation of defining how to deal with
actual cases that we're already aware of -- whether it's because
they're discovered during testing or are found in instances of
existing content developed by actual users or early adopters, or
whatever -- and iterating back through that process after further
testing (or even deployment) expose more such cases.

  --Mike

-- 
Michael(tm) Smith
http://people.w3.org/mike/
Received on Wednesday, 17 June 2009 12:46:38 UTC