- From: Noah Mendelsohn <nrm@arcanedomain.com>
- Date: Mon, 04 Mar 2013 09:38:24 -0500
- To: Anne van Kesteren <annevk@annevk.nl>
- CC: Bjoern Hoehrmann <derhoermi@gmx.net>, Robin Berjon <robin@w3.org>, "www-tag@w3.org List" <www-tag@w3.org>
On 3/4/2013 5:32 AM, Anne van Kesteren wrote: > For new formats though such as WebVTT sniffing > for a file identifier seems to become the norm as a) it's much easier > to develop for and b) it's at least as robust as Content-Type. Why is this an either/or? I think the right way to approach this is: * For certain families of content such as the ones you discuss, agreement can be reached on disjoint in-band markers, typically at the start of the streams, that make the format self-identifying within the family. * Even for these formats, it's appropriate to have an authoritative Content-type >identifying the family<. XML is a good example of this: application/xml identifies the family; you can determine from the root node to figure out the particular XML document type (application/blah+xml is possible but optional). * For other sorts of data formats, such in-band marking is either impossible or a bad tradeoff. Few of us would wish to put at the start of each of our C source files "CPRG" or some such, and it would be incoherent to do it in comma separated variables (CSV), etc. This is not just a legacy issue. There are formats for which in-band markers are a bad tradeoff. * For these formats Content-type is >necessary< to reliably convey the intended interpretation. * Postel's Law is to be conservative in what you send, as well as liberal in what you consume. Sending a jpeg as text/plain is a bug. Period. Rendering it as an image in the browser may be the lesser of the evils given widespread buggy content, but doing so should be viewed as an accommodation. Given that sniffing is to be done, I have no problem with the efforts of the HTML5 community to standardize the rules. Given all of the above, I believe that: I. Where practical, it may be desirable to coordinate disjoint in-band format labeling across a wide range of content. However, we should not assume this will always be practical, or that different "families" may not have conflicting uses of the same markers. II. Content-type should remain authoritative, and should be used as described above to signal the correct interpretation of content. In cases where families of content share disjoint format markers in-band, the Content-type can identify the family or the particular format. III. Serving content with an incorrect Content-type should be viewed as a significant violation of the specifications of the Web. Where the type is not known to the server, Content-type should not be specified. IV. Interpretation of content in a manner contrary to the authoritative Content-type should be avoided where possible. When necessary to accommodate legacy content, as is the case with text/plain today, such "sniffing" should be viewed as an ugly work-around to meet practical needs. To the extent practical, we should move away from such usage. I therefore strongly disagree with Robin's proposal, which is to deprecate the notion of authoritative Content-types. I have no problem endorsing (I.), which I think is in the spirit of where he wants to go. Noah BTW: I wonder whether the time has come for a text/yes-its-really-plain-text media type? Noah
Received on Monday, 4 March 2013 14:39:01 UTC