- From: Chris Lilley <chris@w3.org>
- Date: Wed, 29 May 2002 18:38:16 +0200
- To: www-tag@w3.org, www-tag-request@w3.org, Rob Lanphier <robla@real.com>
- CC: Tim Bray <tbray@textuality.com>
On Wednesday, May 29, 2002, 3:14:37 AM, Rob wrote: RL> The IETF has a well-known motto "Be liberal in what you accept, and RL> conservative in what you send". It's documented in more detail here: RL> http://www.ietf.org/rfc/rfc2360.txt (section 2.9) The 'more detail' says 'A rule, first stated in STD 5/RFC 791, recognized as having benefits in implementation robustness and interoperability is: "Be liberal in what you accept, and conservative in what you send". Or establish restrictions on what a protocol transmits, but be able to deal with every conceivable error received. Caution is urged in applying this approach in standards track protocols. It has in the past lead to conflicts between vendors when interoperability fails. ' RL> The strictness embodied in XML departs from that principle (though not as RL> far from the detailed explanation in RFC 2360 as one might think). I RL> think it would be very helpful for the TAG to somehow adapt section 2.9 of RL> RFC 2360 to the Web. So, given that we are creating standards track protocols and formats, and lack of interoperability is both a core-mission concern and a demonstrated problem, its not clear that 2.9 has anything to offer in general, architectural terms other than a useful bad example. In contrast to error recovery, though, error reporting is a good thing especially if it halts processing. RL> I guess I'd like to call attention to what the TAG has already said: RL> "An example of incorrect and dangerous behavior is a user-agent that RL> reads some part of the body of a response and decides to treat it as RL> HTML based on its containing a <!DOCTYPE declaration or <title> tag, RL> when it was served as text/plain or some other non-HTML type." RL> There's clearly a bigger architectural principle driving that statement RL> than solely the second-guessing of a media type based on content. I'm RL> having trouble teasing that out myself. Here's an attempt at a pithy RL> phrase: RL> "Do what I say, not what I 'mean'". RL> I.e. if I say that something is text/plain, then IT'S TEXT/PLAIN RL> ALREADY!!! I DIDN'T "MEAN" TEXT/HTML!!!!! :) The document should be RL> treated as text/plain. Agreed. Sniffing just produces more problems to work around (ever hit that point in an XHTML document where there is too much xml used, and a browser flips into XML view source mode as some undocumented error recovery path forks?). The response you will get to that though is that there are legions of misconfigured servers that send HTML content as text/plain (or application/octet-stream). And, of course, the vast majority of content providers have zero control over the configuration of the server they use. Try mailing your ISP to request a configuration change like adding a new MIME type, and consider what percentage of the content-authoring public would even consider sending such an email. So, its a problem, and one that (some) browsers have responded to by ignoring the MIME type entirely and sniffing on the content. This is clearly error prone, undocumented, non-interoperable and not extensible. XML usefully removed some reliance on server setup (by putting encoding information in a standard place in the content, where all authoring tools could reliably generate it) and then the XML MIME RFC threw that away again by giving precenence th the MIME charset parameter whetre present (thus depending on specific server conventions with filenames, directory paths, config files and whatnot that users have no control over and authoring tools cannot reliably generate). RL> In general, HTML documents marked as "text/plain" are totally valid RL> "text/plain" documents. Yes. The problem is with the assumption that they are mislabelled. Sometimes they are, sometimes they are not. RL> It's not as though there's an error, per se. RL> Therefore, legitimate responses from webservers should not be treated as RL> "errors" even if there's a 99% probability that what is seen is a RL> configuration problem. Its the 99% estimate that guides developers into "good enough" or "we can fix that, its obvious what is meant". RL> One other issue to give guidance on is "how should vendor-specific RL> extensions be allowed". Some W3C specifications, like SMIL 2.0 Language, RL> are very strict in allowing only elements and attributes that are RL> explicited scoped or qualified to a namespace URI are allowed. Yes. SVG is the same. If you want extensions, put them in a different namespace. RL> This has RL> the effect that *all* proprietary extensions to SMIL 2.0 must be traceable RL> to a URI. Right. RL> See the following section of the SMIL document for the nitty-gritty of RL> this: RL> http://www.w3.org/TR/smil20/smil20-profile.html#SMILProfileNS-LanguageConformance RL> In doing this, the SMIL working group thought we were going in the RL> direction that the consortium as a whole was tending toward. If we RL> misread the situation, or if circumstances have changed, then it would be RL> good to know what the current read on matters is. I would say that your read on the direction was correct and was a good decision with clear foresight. RL> Clearly, current RL> document formats are what they are, but it would be good to provide RL> guidance to new working groups as to how strict new document formats RL> should be. I think that SMIL 2.0 and SVG 1.0 provide good examples here. Perhaps something can be generalized a litte from those, to apply to other formats used in Web clients. Its difficult to see how to generalize to the whole Web though, including protocols and so forth. RL> Should the W3C ever publish a new recommendation that is as liberal as RL> HTML, The part in HTML 2,3.2,4.x about ignoring tags and displaying content was a mistake, sure. I don't see it being repeated, though, and XHTTML 1.x does not repeat it (otherwise the parse tree would not be the same as a standard XML parser). RL> or is that seen as a legacy issue? I would see it as both a legacy issue and a very clearly failed experiment. RL> If XML's draconian error-handling was the right decision, is that RL> the right decision for more than XML? Depends on the definition of 'error'. Is a channel that is supposed to have one bitrate and actually has a different bitrate an error? RL> This is the type of question it would be nice to answer. Perhaps I'm RL> bundling up many issues in a single request, but I'm not sure I know how RL> to break this up into bite-sized chunks. If there are suggestions, I'd RL> like to hear them. As would I. -- Chris mailto:chris@w3.org
Received on Wednesday, 29 May 2002 12:39:17 UTC