Re: Revisiting Authoritative Metadata (was: The failure of Appendix C as a transition technique)

On 22/02/2013 12:36 , David Sheets wrote:
> On Fri, Feb 22, 2013 at 1:22 AM, Robin Berjon <robin@w3.org> wrote:
>> I would support the TAG revisiting the topic of Authoritative Metadata, but
>> with a view on pointing out that it is an architectural antipattern.
>> Information that is essential and authoritative about the processing of a
>> payload should be part of the payload and not external to it. Anything else
>> is brittle and leads to breakage.
>
> HTTP is a text protocol. HTTP messages are of type application/http or
> message/http. HTTP messages include metadata regarding the properties
> of the representation of the served resource including the media type
> of the envelope's contents, the version of the representation, and
> information about when the message expires.

Right. But that information gets lost when you save the payload. People 
don't go about saving HTTP messages to disk, they save the payload. What 
one cares about transmitting most of the time is the payload, not the 
message. The message is just a by-product of the fact that you want to 
use this protocol.

> How is telling the client everything it needs to know about processing
> and storage external to the message? It's in the message.

But the message isn't the interesting unit of analysis, the payload is. 
Introducing the ability for the message and payload to be out of sync is 
to introduce fragility and an attack vector.

> How is this an antipattern? It's very standard and very unambiguous.

Something can be standard and unambiguous and yet still a bad idea.

>> The sniffing behaviour is a consequence of media types as an architectural
>> construct, not an alternative to it.
>
> Sniffing is brittle and leads to breakage as included metadata
> regarding how to interpret the payload is ignored.
>
> The sniffing behaviour is a consequence of an attitude of Big Browser
> Knows Best regarding media types.

No it's not. Sniffing is a direct consequence of authoritative metadata. 
It's certainly not something that's limited to browsers. I have written 
plenty of tools over the years that ignore the media type just because 
you have to: it's wrong. RSS shipped as a bewildering array of wrong 
media types or JSON shipped (typically) as text/html are just the more 
prominent examples.

No one uses sniffing because they find it fun. Sniffing is there because 
media types, as a technical and social construct, is inherently brittle.

> The alternative to this behaviour is respecting the media type as transmitted.

And how does that help anyone when it's wrong?

> How is sniffing a consequence of following the protocol?

Two primary aspects contribute to this:

• The information essential to the processing of the payload is made to 
be volatile, such that in many if not most cases it exists only during 
transmission but not before or after. In some cases, it can in fact be 
difficult to keep it (the typical case being maintaining content coding 
and media type while storing in a regular file system). This volatility 
leads to information loss and errors. When two pieces of information can 
easily go out of synch, they will.

• The cost of error is born by the receiver, not the sender. In any such 
system you are guaranteed to see receivers perform error correction, and 
those will dominate over time (simply by virtue of being better for 
their users).

>> Further, I think that the TAG should take this occasion to issue a
>> recommendation to people building formats that they include format
>> identifying information as essential, typically with a magic number, first
>> non-blank line, etc.
>
> What occasion would that be?

The aforementioned revisiting of this issue.

> Here's how to can tell you are receiving an HTTP message:
> <http://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html#sec6.1>

Yeah, but transmission is just a small part of the data lifecycle.

-- 
Robin Berjon - http://berjon.com/ - @robinberjon

Received on Monday, 25 February 2013 12:25:13 UTC