Postel's law, dealing with invalid content, TInkerbell effect from Larry Masinter on 2009-01-22 (www-tag@w3.org from January 2009)

From: Larry Masinter <masinter@adobe.com>
Date: Wed, 21 Jan 2009 16:36:21 -0800
To: "noah_mendelsohn@us.ibm.com" <noah_mendelsohn@us.ibm.com>
CC: "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <8B62A039C620904E92F1233570534C9B0118C8472600@nambx04.corp.adobe.com>

> .. one can also debate whether the world would have been a better place 
> if HTML error handling had been stricter from the start, and less junky 
> HTML were out there, but that train has mostly left the station, I think.)

I think it's worth talking about whether the train really has left the
station and whether the situation is irreparable.

I think the architectural principle is laid out pretty well by
the "liberal in what you accept/conservative in what you send" [1] policy:
 [1]  http://en.wikipedia.org/wiki/Postel's_law  
"Postel's principle is often misinterpreted as discouraging checking
 messages for validity."

To apply "internet architecture" principles to "web architecture" may
require some mapping of terminology (a HTML document as a kind of
'message', and a browser and a web server as kinds of 'host'), but
the principles still apply.

For example, if there were cooperation from the browser makers,
it might actually be possible to tie error-on-invalid-content to new 
features, so that old content would continue to work, but to get access 
to new features, you would have to clean up your tag soup. 

This might be a little unpleasant for some authors and authoring tools,
but it would give a credible way out of the current mess, in a way
that would allow better extensibility than the current and unworkable
approach of taking various responses to "erroneous" input and mandating
the proper behavior (thus turning it into "acceptable" input).

I say it is "unworkable" because the economic forces that led to the
current condition and the "browser wars" [2] haven't  gone away;
[2] http://en.wikipedia.org/wiki/Browser_wars
the cause has almost nothing to do with the specifications.

No matter what any specification says, there will always be some
motivation to extend the behavior of one receiver or another, in
order to make that software tool "preferable". And, having done
so, there will be a natural motivation for some content provider
or sender to take advantage of special knowledge about that recipient
to take advantage of that extension or difference. Certainly that's
still the case today, with web sites and server technology based
on extensive "sniffing" of browser versions to determine which
parts of interim HTML specifications are implemented in what
particular way.  

While the specifications by themselves have little effect, the
actual introduction of error checking in dominant and widely
deployed tools *can* have an effect. I was surprised at how
effective it was to get HTTP user agents upgraded to support the
"Host" header because some (popular) web sites started insisting
on its presence. But it did require some willingness of *some*
of the players to check for "error" behavior rather than quietly
accepting input even when the "host" header was unnecessary.

A similar strategy might actually be workable for "tag soup HTML",
but it's a Tinkerbell Effect[3] dream: can only fly if enough
people *believe* it can fly.
[3] http://en.wikipedia.org/wiki/Tinkerbell_effect 
Larry
-- 
http://larry.masinter.net

Received on Thursday, 22 January 2009 00:37:08 UTC