Re: Revisiting Authoritative Metadata (was: The failure of Appendix C as a transition technique) from Robin Berjon on 2013-02-25 (www-tag@w3.org from February 2013)

From: Robin Berjon <robin@w3.org>
Date: Mon, 25 Feb 2013 13:42:23 +0100
To: Mark Baker <distobj@acm.org>
CC: Larry Masinter <masinter@adobe.com>, Henri Sivonen <hsivonen@iki.fi>, "www-tag@w3.org List" <www-tag@w3.org>
Message-ID: <512B5C2F.4040405@w3.org>

On 22/02/2013 17:52 , Mark Baker wrote:
> On Fri, Feb 22, 2013 at 4:22 AM, Robin Berjon <robin@w3.org> wrote:
>> I would support the TAG revisiting the topic of Authoritative Metadata, but
>> with a view on pointing out that it is an architectural antipattern.
>> Information that is essential and authoritative about the processing of a
>> payload should be part of the payload and not external to it. Anything else
>> is brittle and leads to breakage.
>
> Robin, could you please back up those bold claims

Sure, please notably see the two bullet points at the bottom of:

     http://lists.w3.org/Archives/Public/www-tag/2013Feb/0129.html

I also believe that Ruby's Postulate applies:

"""
The accuracy of metadata is inversely proportional to the square of the 
distance between the data and the metadata.
"""

> perhaps by pointing
> out the problems with the current "Why embedded metadata is less
> authoritative" section?
>
> http://www.w3.org/2001/tag/doc/mime-respect#embedded

Easily. That section contains two paragraphs and both are built atop 
assumptions that are at best unsubstantiated.

The first relies on the infamous "sending text/html with the intent of 
having it render as text/plain" example. I've been hearing that example 
for a decade now. Apart from being devoid of technical motivation (since 
you can use <plaintext>) is there a second example? Notably, are there 
examples involving non-text media types?

It seems to proceed on the assumption that a sender indicating multiple 
interpretations for the same representation is a key architectural 
feature. I think this begs the question: why? And assuming someone does 
have a use case, is it worth the cost of requiring a content type on 
every response and of introducing the sort of frailty that leads to 
sniffing? I've been doing web hacking for something like 18 years by 
now, including some stuff that I'm pretty sure would be considered 
rather exotic, but using a different media type for the same 
representation simply has never come up. Not even in a freak prototype.

The second paragraph is simply untrue. Looking at the first bytes of a 
payload to read a magic number or some such is not more expensive than 
reading the media type. It is certainly less expensive than having to 
read both the media type and the first few bytes because you know that 
the media type will be broken.

> From my POV, that section doesn't go far enough in explaining the
> problems with embedded metadata. In particular it fails to point out
> the security problems with format masquerading.

Authoritative metadata only prevents that during message transmission. 
But most of that metadata is volatile. Media types make it easier, not 
harder, to introduce format masquerading. For instance, this:

http://w3c-test.org/webapps/Workers/tests/submissions/Opera/constructors/Worker/AbstractWorker.onerror.html

can be interpreted as HTML or JS just by switching the media type. This 
means that you could get it past some checks by labelling it text/html, 
and then cause it to run.

-- 
Robin Berjon - http://berjon.com/ - @robinberjon

Received on Monday, 25 February 2013 12:42:36 UTC