Re: New issue: error recovery practices (Re: Proposed TAG Finding: Internet Media Type registration, consistency of use)

Hi Tim,

Thanks for your response.  I agree that the issue is still pretty fuzzy,
and so here's my attempt to scare the issue out from the underbrush.  More
inline

On Tue, 28 May 2002, Tim Bray wrote:
> Rob Lanphier wrote:
>  > Summary:  this is a request that the TAG issue a finding regarding
>  > appropriate error resilience/recovery/"second guessing" in web software.
>
> The TAG spent some time on this on May 27, and while there's an issue
> lurking out there, it needs a bit more cooking before we're ready to
> take it up officially.
>
>  > *  Should future XML-based language specifications from W3C extend
>  >    traditional XML strictness into attribute values and other areas left
>  >    undefined by XML?
>
> The answer seems to be "it depends".  I'm having trouble imagining what
> kind of thing we could say that would cover the general case.  Is there
> a general case here?

Maybe not, and maybe that's what needs to be said (say explicitly "it
depends" somewhere, rather than have people assume one thing or another).

The IETF has a well-known motto "Be liberal in what you accept, and
conservative in what you send".  It's documented in more detail here:
http://www.ietf.org/rfc/rfc2360.txt (section 2.9)

The strictness embodied in XML departs from that principle (though not as
far from the detailed explanation in RFC 2360 as one might think).  I
think it would be very helpful for the TAG to somehow adapt section 2.9 of
RFC 2360 to the Web.

>
>  > *  Should specifications be clear on what is safe to ignore?  (I would
>  >    hope so....not always the case, so perhaps this should be written
> down)
>  > *  When is it safe to specify that unknown issues can be ignored
>  >    ("ignorability"), and when must specification writers not allow
>  >    ignorability?
>
> Same comment, really.  I'm having trouble seeing the general case or
> imagining what a TAG finding could say.

I guess I'd like to call attention to what the TAG has already said:
   "An example of incorrect and dangerous behavior is a user-agent that
    reads some part of the body of a response and decides to treat it as
    HTML based on its containing a <!DOCTYPE declaration or <title> tag,
    when it was served as text/plain or some other non-HTML type."

There's clearly a bigger architectural principle driving that statement
than solely the second-guessing of a media type based on content.  I'm
having trouble teasing that out myself.  Here's an attempt at a pithy
phrase:

   "Do what I say, not what I 'mean'".

I.e. if I say that something is text/plain, then IT'S TEXT/PLAIN
ALREADY!!!  I DIDN'T "MEAN" TEXT/HTML!!!!!  :)  The document should be
treated as text/plain.

In general, HTML documents marked as "text/plain" are totally valid
"text/plain" documents.  It's not as though there's an error, per se.
Therefore, legitimate responses from webservers should not be treated as
"errors" even if there's a 99% probability that what is seen is a
configuration problem.

One other issue to give guidance on is "how should vendor-specific
extensions be allowed".  Some W3C specifications, like SMIL 2.0 Language,
are very strict in allowing only elements and attributes that are
explicited scoped or qualified to a namespace URI are allowed.  This has
the effect that *all* proprietary extensions to SMIL 2.0 must be traceable
to a URI.

See the following section of the SMIL document for the nitty-gritty of
this:
http://www.w3.org/TR/smil20/smil20-profile.html#SMILProfileNS-LanguageConformance

In doing this, the SMIL working group thought we were going in the
direction that the consortium as a whole was tending toward.  If we
misread the situation, or if circumstances have changed, then it would be
good to know what the current read on matters is.  Clearly, current
document formats are what they are, but it would be good to provide
guidance to new working groups as to how strict new document formats
should be.

> I'm trying not to diss the issue that Rob's raising here.  Clearly the
> decision as to how-liberal-to-be-in-what-you-accept is architectural in
> scope.  On the other hand, the W3C specifies languages designed for
> authorship by nontechnical humans, protocols for significant e-commerce
> payloads, and pretty well everything in between, so is there an
> architectural principle that cuts across the spectrum?  For example, I
> (perhaps in a minority) am OK with HTML processors being very liberal in
> what they accept; it helps let everyone publish to the web.  I also
> believe that XML's draconian error-handling was the right design decision.

Should the W3C ever publish a new recommendation that is as liberal as
HTML, or is that seen as a legacy issue?  If XML's draconian
error-handling was the right decision, is that the right decision for more
than XML?

This is the type of question it would be nice to answer.  Perhaps I'm
bundling up many issues in a single request, but I'm not sure I know how
to break this up into bite-sized chunks.  If there are suggestions, I'd
like to hear them.

Thanks,
Rob

Received on Tuesday, 28 May 2002 21:13:33 UTC