- From: Larry Masinter <masinter@adobe.com>
- Date: Sat, 13 Mar 2010 22:09:17 -0800
- To: "www-tag@w3.org" <www-tag@w3.org>
(Had this queued up, but noted I hadn't sent it): There are many computer "languages". Because of this, there are often strings which are "valid" or might even seem to be "appropriate" for more than one language. Software should not "guess" that something that is labeled as being X is more appropriately treated as Y, unless the software is an expert at interpreting the "X" label, and using "X" doesn't work, while using Y does, and the likelihood that this is a configuration error (rather than sender intention) is high. Example 1: if something comes labeled as application/vnd.company.specialpng, the receiver should treat it as vnd.company.specialpng, and never "guess" that it should instead treated as a PNG, just because it "looks like a PNG". Example 2: If something comes in labeled as text/plain, however, and the application interpreting the data knows sufficiently about text/plain to know that the data would be intelligible, while it more closely matches, say, image/jpeg, and would correctly display as image/jpeg, and also that mis-configuration of the site from which the image was retrieved is likely (through statistical analysis, say), then performing "sniffing" might be an option. I think this general rule should apply to MIME types, HTML versions, charset labels and language tags (four kinds of 'sniffing' currently covered by the HTML document.) Allowing for disambiguation when the content is removed from the particular context and repurposed for some other context is the reason why content SHOULD be "self-describing", why specifications should explicitly allow for content that intends to match the specification to be labeled with the specification name and version, and that any re-interpretation of that self-description should be done cautiously, confirmed, and, if possible, a way of "correcting" mis-labeled document made a required or encouraged element of any tooling. (quoting myself) > I've postponed ACTION-386 (which was to do a more thorough > in-depth review of the "sniffing" document), but I wonder if > it might be possible have a discussion about a very small > piece of it. > > The mime sniff document, many W3C recommendations, and > many discussions, including the recent traffic in > public-html@w3.org around re-registration of the > text/html MIME type all seem to take the form of > > "Can I serve an X document as Y" > > "How can I 'sniff' that an X document served as Y > really is an X." > > These discussions seem to assume that the notion of > "an X document" (an HTML 5 document, an XHTML2 document) > is meaningful and well-formed and decidable without > any additional contextual information. > > But in the case of "polyglot" documents, we have something > that is simultaneously "an X document" and "a Y document", > or is either one or the other. > > I'd like to see if we could get some agreement on > a way to rephrase those statements and questions. >
Received on Sunday, 14 March 2010 06:09:54 UTC