Re: New issue: error recovery practices (Re: Proposed TAG Finding: Internet Media Type registration, consistency of use) from Chris Lilley on 2002-05-29 (www-tag@w3.org from May 2002)

From: Chris Lilley <chris@w3.org>
Date: Wed, 29 May 2002 18:38:16 +0200
To: www-tag@w3.org, www-tag-request@w3.org, Rob Lanphier <robla@real.com>
CC: Tim Bray <tbray@textuality.com>
Message-ID: <8013088328.20020529183816@w3.org>
On Wednesday, May 29, 2002, 3:14:37 AM, Rob wrote:

RL> The IETF has a well-known motto "Be liberal in what you accept, and
RL> conservative in what you send".  It's documented in more detail here:
RL> http://www.ietf.org/rfc/rfc2360.txt (section 2.9)

The 'more detail' says

'A rule, first stated in STD 5/RFC 791, recognized as having benefits
in implementation robustness and interoperability is:

   "Be liberal in what you accept, and
   conservative in what you send".

Or establish restrictions on what a protocol transmits, but be able
to deal with every conceivable error received.  Caution is urged in
applying this approach in standards track protocols.  It has in the
past lead to conflicts between vendors when interoperability fails. '

RL> The strictness embodied in XML departs from that principle (though not as
RL> far from the detailed explanation in RFC 2360 as one might think).  I
RL> think it would be very helpful for the TAG to somehow adapt section 2.9 of
RL> RFC 2360 to the Web.

So, given that we are creating standards track protocols and formats,
and lack of interoperability is both a core-mission concern and a
demonstrated problem, its not clear that 2.9 has anything to offer
in general, architectural terms other than a useful bad example.

In contrast to error recovery, though, error reporting is a good thing
especially if it halts processing.

RL> I guess I'd like to call attention to what the TAG has already said:
RL>    "An example of incorrect and dangerous behavior is a user-agent that
RL>     reads some part of the body of a response and decides to treat it as
RL>     HTML based on its containing a <!DOCTYPE declaration or <title> tag,
RL>     when it was served as text/plain or some other non-HTML type."

RL> There's clearly a bigger architectural principle driving that statement
RL> than solely the second-guessing of a media type based on content.  I'm
RL> having trouble teasing that out myself.  Here's an attempt at a pithy
RL> phrase:

RL>    "Do what I say, not what I 'mean'".

RL> I.e. if I say that something is text/plain, then IT'S TEXT/PLAIN
RL> ALREADY!!!  I DIDN'T "MEAN" TEXT/HTML!!!!!  :)  The document should be
RL> treated as text/plain.

Agreed. Sniffing just produces more problems to work around (ever hit
that point in an XHTML document where there is too much xml used, and
a browser flips into XML view source mode as some undocumented error
recovery path forks?).

The response you will get to that though is that there are legions of
misconfigured servers that send HTML content as text/plain (or
application/octet-stream). And, of course, the vast majority of
content providers have zero control over the configuration of the
server they use. Try mailing your ISP to request a configuration
change like adding a new MIME type, and consider what percentage of
the content-authoring public would even consider sending such an
email.

So, its a problem, and one that (some) browsers have responded to by
ignoring the MIME type entirely and sniffing on the content. This is
clearly error prone, undocumented, non-interoperable and not
extensible.

XML usefully removed some reliance on server setup (by putting
encoding information in a standard place in the content, where all
authoring tools could reliably generate it) and then the XML MIME RFC
threw that away again by giving precenence th the MIME charset
parameter whetre present (thus depending on specific server
conventions with filenames, directory paths, config files and whatnot
that users have no control over and authoring tools cannot reliably
generate).

RL> In general, HTML documents marked as "text/plain" are totally valid
RL> "text/plain" documents.

Yes. The problem is with the assumption that they are mislabelled.
Sometimes they are, sometimes they are not.

RL>  It's not as though there's an error, per se.
RL> Therefore, legitimate responses from webservers should not be treated as
RL> "errors" even if there's a 99% probability that what is seen is a
RL> configuration problem.

Its the 99% estimate that guides developers into "good enough" or "we
can fix that, its obvious what is meant".

RL> One other issue to give guidance on is "how should vendor-specific
RL> extensions be allowed".  Some W3C specifications, like SMIL 2.0 Language,
RL> are very strict in allowing only elements and attributes that are
RL> explicited scoped or qualified to a namespace URI are allowed.

Yes. SVG is the same. If you want extensions, put them in a different
namespace.

RL>  This has
RL> the effect that *all* proprietary extensions to SMIL 2.0 must be traceable
RL> to a URI.

Right.

RL> See the following section of the SMIL document for the nitty-gritty of
RL> this:
RL> http://www.w3.org/TR/smil20/smil20-profile.html#SMILProfileNS-LanguageConformance

RL> In doing this, the SMIL working group thought we were going in the
RL> direction that the consortium as a whole was tending toward.  If we
RL> misread the situation, or if circumstances have changed, then it would be
RL> good to know what the current read on matters is.

I would say that your read on the direction was correct and was a good
decision with clear foresight.

RL>   Clearly, current
RL> document formats are what they are, but it would be good to provide
RL> guidance to new working groups as to how strict new document formats
RL> should be.

I think that SMIL 2.0 and SVG 1.0 provide good examples here. Perhaps
something can be generalized a litte from those, to apply to other
formats used in Web clients. Its difficult to see how to generalize to
the whole Web though, including protocols and so forth.

RL> Should the W3C ever publish a new recommendation that is as liberal as
RL> HTML,

The part in HTML 2,3.2,4.x about ignoring tags and displaying content
was a mistake, sure. I don't see it being repeated, though, and XHTTML
1.x does not repeat it (otherwise the parse tree would not be the same
as a standard XML parser).

RL> or is that seen as a legacy issue?


I would see it as both a legacy issue and a very clearly failed experiment.

RL> If XML's draconian error-handling was the right decision, is that
RL> the right decision for more than XML?

Depends on the definition of 'error'. Is a channel that is supposed to
have one bitrate and actually has a different bitrate an error?

RL> This is the type of question it would be nice to answer.  Perhaps I'm
RL> bundling up many issues in a single request, but I'm not sure I know how
RL> to break this up into bite-sized chunks.  If there are suggestions, I'd
RL> like to hear them.

As would I.


-- 
 Chris                            mailto:chris@w3.org
Received on Wednesday, 29 May 2002 12:39:17 UTC