RE: Revisiting Authoritative Metadata (was: The failure of Appendix C as a transition technique) from Larry Masinter on 2013-03-04 (www-tag@w3.org from March 2013)

From: Larry Masinter <masinter@adobe.com>
Date: Mon, 4 Mar 2013 02:06:28 -0800
To: Bjoern Hoehrmann <derhoermi@gmx.net>, Robin Berjon <robin@w3.org>
CC: "www-tag@w3.org List" <www-tag@w3.org>
Message-ID: <C68CB012D9182D408CED7B884F441D4D1E880CA3D7@nambxv01a.corp.adobe.com>
I think this is a really important discussion -- about sniffing and MIME types and reliability and authoritative metadata.

To be specific, *PLEASE* review  
   http://mimesniff.spec.whatwg.org/
and also the bug reports on it
https://www.w3.org/Bugs/Public/buglist.cgi?product=WHATWG&component=MIME&resolution=---


including the reference to the IETF bugs (many reported by me)
   http://trac.tools.ietf.org/wg/websec/trac/query?component=mime-sniff


on the IETF draft 
   http://tools.ietf.org/html/draft-ietf-websec-mime-sniff-03

(before abandoned)

TO repeat: I think it is possible to substantially reduce sniffing and the many problems associated with it, but it does require some will and effort.

So far Mr. Berjon has not acknowledged the difficulties with sniffing are significant enough to not make it preferred.
This seems like it could be reduced to a question of fact.


> -----Original Message-----
> From: Bjoern Hoehrmann [mailto:derhoermi@gmx.net]
> Sent: Sunday, March 03, 2013 4:51 PM
> To: Robin Berjon
> Cc: www-tag@w3.org List
> Subject: Re: Revisiting Authoritative Metadata (was: The failure of Appendix C
> as a transition technique)
> 
> * Robin Berjon wrote:
> >The first relies on the infamous "sending text/html with the intent of
> >having it render as text/plain" example. I've been hearing that example
> >for a decade now. Apart from being devoid of technical motivation (since
> >you can use <plaintext>) is there a second example? Notably, are there
> >examples involving non-text media types?
> >
> >It seems to proceed on the assumption that a sender indicating multiple
> >interpretations for the same representation is a key architectural
> >feature. I think this begs the question: why? And assuming someone does
> >have a use case, is it worth the cost of requiring a content type on
> >every response and of introducing the sort of frailty that leads to
> >sniffing? I've been doing web hacking for something like 18 years by
> >now, including some stuff that I'm pretty sure would be considered
> >rather exotic, but using a different media type for the same
> >representation simply has never come up. Not even in a freak prototype.
> 
> As far as I can tell, the concern is over, when multiple interpretations
> are possible, that the author can clarify which interpretation is inten-
> ded. Plain text is a simple example because being "plain" it lacks other
> identifying features, like the character encoding used for the document.
> 
> When I bought my first computer it was the norm that text files you made
> on one computer do not work on the next one if you did anything unusual,
> like typing my name into them, because different systems had different
> local conventions (DOS and Windows on the same computer, for instance).
> 
> I would not want to write "#encoding=windows-1252" every time I make a
> new text file, nor would I want to be bothered by a text editor add that
> for me every time (and possibly hide it so I end up forgetting it when I
> run the file through `grep` or other tools later); with that historical
> perspective the Content-Type header is a considerable improvement.
> 
> (These problems suggest to me that you would rather put "identifying"
> information in the filesystem, and as a consequence convey it through
> network protocols at a higher level than "file contents", meaning you'd
> end up with something like Content-Type even under ideal circumstances.)
> 
> Examples for binary types should be easy to come up with. Many formats
> re-use the PKZIP format; you would not want applications to treat .jar,
> .docx, .xpi, and any number of other formats as ordinary .zip files. And
> you would want zip-compatible applications to treat self-extracting .exe
> files as zip files (because you can make files that are .exe and .zip at
> the same time, you can enable "novice" users to extract the archive, and
> allow more experienced users to avoid running the .exe code).
> 
> >The second paragraph is simply untrue. Looking at the first bytes of a
> >payload to read a magic number or some such is not more expensive than
> >reading the media type. It is certainly less expensive than having to
> >read both the media type and the first few bytes because you know that
> >the media type will be broken.
> 
> How do you read the first bytes when the data is compressed and you do
> not understand the compression scheme? An intermediary might want to
> understand what kind of data is being transmitted, but you do not want
> to have to upgrade the intermediary every time a new compression scheme
> comes along. Does that happen often? How important is this feature? We,
> especially the "we" from over 20 years ago, do not quite know...
> --
> Björn Höhrmann · mailto:bjoern@hoehrmann.de ·
> http://bjoern.hoehrmann.de
> Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
> 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Received on Monday, 4 March 2013 10:07:01 UTC