Re: Revisiting Authoritative Metadata (was: The failure of Appendix C as a transition technique) from Alex Russell on 2013-02-27 (www-tag@w3.org from February 2013)

From: Alex Russell <slightlyoff@google.com>
Date: Wed, 27 Feb 2013 09:13:48 +0000
To: Robin Berjon <robin@w3.org>
Cc: Henri Sivonen <hsivonen@iki.fi>, "www-tag@w3.org List" <www-tag@w3.org>, Larry Masinter <masinter@adobe.com>, Mark Baker <distobj@acm.org>
Message-ID: <CANr5HFV_iq9Ze9gCGCozVaCy=dnd5a5U49QR1Q-cM3NDcbvg=g@mail.gmail.com>

Apologies for the late replies.
On Feb 25, 2013 12:43 PM, "Robin Berjon" <robin@w3.org> wrote:
>
> On 22/02/2013 17:52 , Mark Baker wrote:
>>
>> On Fri, Feb 22, 2013 at 4:22 AM, Robin Berjon <robin@w3.org> wrote:
>>>
>>> I would support the TAG revisiting the topic of Authoritative Metadata,
but
>>> with a view on pointing out that it is an architectural antipattern.
>>> Information that is essential and authoritative about the processing of
a
>>> payload should be part of the payload and not external to it. Anything
else
>>> is brittle and leads to breakage.
>>
>>
>> Robin, could you please back up those bold claims
>
>
> Sure, please notably see the two bullet points at the bottom of:
>
>     http://lists.w3.org/Archives/Public/www-tag/2013Feb/0129.html
>
> I also believe that Ruby's Postulate applies:
>
> """
> The accuracy of metadata is inversely proportional to the square of the
distance between the data and the metadata.
>
> """
>
>> perhaps by pointing
>> out the problems with the current "Why embedded metadata is less
>> authoritative" section?
>>
>> http://www.w3.org/2001/tag/doc/mime-respect#embedded
>
>
> Easily. That section contains two paragraphs and both are built atop
assumptions that are at best unsubstantiated.
>
> The first relies on the infamous "sending text/html with the intent of
having it render as text/plain" example. I've been hearing that example for
a decade now. Apart from being devoid of technical motivation (since you
can use <plaintext>) is there a second example? Notably, are there examples
involving non-text media types?
>
> It seems to proceed on the assumption that a sender indicating multiple
interpretations for the same representation is a key architectural feature.
I think this begs the question: why? And assuming someone does have a use
case, is it worth the cost of requiring a content type on every response
and of introducing the sort of frailty that leads to sniffing? I've been
doing web hacking for something like 18 years by now, including some stuff
that I'm pretty sure would be considered rather exotic, but using a
different media type for the same representation simply has never come up.
Not even in a freak prototype.
>
> The second paragraph is simply untrue. Looking at the first bytes of a
payload to read a magic number or some such is not more expensive than
reading the media type. It is certainly less expensive than having to read
both the media type and the first few bytes because you know that the media
type will be broken.

It is also what engines do in practice anyway for other reasons (doctype,
base URL config, etc).

I'd like to go further than you have and say that as an economic matter,
the producer/transmitter split between  the authority granted to devs over
the content they build vs. the systems that serve the content all but sinks
any principle around header metadata in practice. It is farce to act as
though this is not the case, and the TAG should only recommend the possible.

>> From my POV, that section doesn't go far enough in explaining the
>> problems with embedded metadata. In particular it fails to point out
>> the security problems with format masquerading.
>
>
> Authoritative metadata only prevents that during message transmission.
But most of that metadata is volatile. Media types make it easier, not
harder, to introduce format masquerading. For instance, this:
>
>
>
http://w3c-test.org/webapps/Workers/tests/submissions/Opera/constructors/Worker/AbstractWorker.onerror.html
>
> can be interpreted as HTML or JS just by switching the media type. This
means that you could get it past some checks by labelling it text/html, and
then cause it to run.
>
>
> --
> Robin Berjon - http://berjon.com/ - @robinberjon
>

Received on Wednesday, 27 February 2013 09:14:16 UTC