- From: Mark Baker <distobj@acm.org>
- Date: Tue, 26 Feb 2013 00:25:14 -0500
- To: Robin Berjon <robin@w3.org>
- Cc: Larry Masinter <masinter@adobe.com>, Henri Sivonen <hsivonen@iki.fi>, "www-tag@w3.org List" <www-tag@w3.org>
On Mon, Feb 25, 2013 at 7:42 AM, Robin Berjon <robin@w3.org> wrote: >> perhaps by pointing >> out the problems with the current "Why embedded metadata is less >> authoritative" section? >> >> http://www.w3.org/2001/tag/doc/mime-respect#embedded > > > Easily. That section contains two paragraphs and both are built atop > assumptions that are at best unsubstantiated. > > The first relies on the infamous "sending text/html with the intent of > having it render as text/plain" example. I've been hearing that example for > a decade now. Apart from being devoid of technical motivation (since you can > use <plaintext>) is there a second example? Notably, are there examples > involving non-text media types? There's a handful in the text/* space - I forget the details - but two more recent other examples I know of, XSLT and RDF/XML; http://www.markbaker.ca/Talks/2004-media-types-and-compdocs/slide4-0.html http://www.markbaker.ca/Talks/2004-xmlself/slide24-0.html A primary reason that these edge cases are important from my POV, is security; if not all parties in an exchange agree on the meaning of the message then that opens a hole as an attacker can play off one party against another, e.g. tunnelling Javascript past an active-content-blocking proxy through to a browser by exploiting implementation differences. Some argue that standardizing sniffing algorithms would prevent this, but that is in itself a form of *external* metadata, and one which can't solve that problem unless its deployed simultaneously to all HTTP actors that care about it... which is obviously not going to happen. (Aside; which shares some similarities with the situation underlying http://publicsuffix.org/ ) > It seems to proceed on the assumption that a sender indicating multiple > interpretations for the same representation is a key architectural feature. Don't get ahead of yourself there :) We agree that multiple interpretations is bad. > I think this begs the question: why? And assuming someone does have a use > case, is it worth the cost of requiring a content type on every response and > of introducing the sort of frailty that leads to sniffing? I've been doing > web hacking for something like 18 years by now, including some stuff that > I'm pretty sure would be considered rather exotic, but using a different > media type for the same representation simply has never come up. Not even in > a freak prototype. *shrug* I'm sure I've done far less projects than you, but I've used it to get the intended behaviour when wanting to force a save-to-file action (yes, yes, not with Content-Disposition :), and to do the equivalent of "view source" on XML content. > The second paragraph is simply untrue. Looking at the first bytes of a > payload to read a magic number or some such is not more expensive than > reading the media type. It is when it's encoded, which is usually the case. > Authoritative metadata only prevents that during message transmission. But > most of that metadata is volatile. Media types make it easier, not harder, > to introduce format masquerading. For instance, this: > > > http://w3c-test.org/webapps/Workers/tests/submissions/Opera/constructors/Worker/AbstractWorker.onerror.html > > can be interpreted as HTML or JS just by switching the media type. This > means that you could get it past some checks by labelling it text/html, and > then cause it to run. I can't speak to that easily since I don't know anything about the details of how the JS spec identifies what is executable JS vs what is not. But the security problem isn't a consequence of masquerading being attempted by mislabelling content with a media type, as that obviously cannot be prevented. It's a consequence of different actors in the message/content processing pipeline failing to agree on the meaning of the message (as above) and sending it down an incorrect processing path. Mark.
Received on Tuesday, 26 February 2013 05:25:43 UTC