- From: David Sheets <kosmo.zb@gmail.com>
- Date: Mon, 25 Feb 2013 20:14:58 -0800
- To: Robin Berjon <robin@w3.org>
- Cc: Larry Masinter <masinter@adobe.com>, Henri Sivonen <hsivonen@iki.fi>, "www-tag@w3.org List" <www-tag@w3.org>
On Mon, Feb 25, 2013 at 4:25 AM, Robin Berjon <robin@w3.org> wrote: > On 22/02/2013 12:36 , David Sheets wrote: >> >> On Fri, Feb 22, 2013 at 1:22 AM, Robin Berjon <robin@w3.org> wrote: >>> >>> I would support the TAG revisiting the topic of Authoritative Metadata, >>> but >>> with a view on pointing out that it is an architectural antipattern. >>> Information that is essential and authoritative about the processing of a >>> payload should be part of the payload and not external to it. Anything >>> else >>> is brittle and leads to breakage. >> >> >> HTTP is a text protocol. HTTP messages are of type application/http or >> message/http. HTTP messages include metadata regarding the properties >> of the representation of the served resource including the media type >> of the envelope's contents, the version of the representation, and >> information about when the message expires. > > Right. But that information gets lost when you save the payload. People > don't go about saving HTTP messages to disk, they save the payload. I believe you mean end user agents save the payload. It is quite common for automated consumers to persist transactional metadata. > What one cares about transmitting most of the time is the payload, not the message. The message is the standard way to deliver the payload. If you just want to send the payload, use netcat. Or use HTTP without a Content-type header. > The message is just a by-product of the fact that you want to use this protocol. The information doesn't *have* to be destroyed when you *save*. >> How is telling the client everything it needs to know about processing >> and storage external to the message? It's in the message. > > But the message isn't the interesting unit of analysis, the payload is. The message is the fundamental unit of analysis of the protocol. The payload contains only the requested content but not necessarily data about the publication of the content. > Introducing the ability for the message and payload to be out of sync is to > introduce fragility and an attack vector. Throwing away important information introduces fragility and an attack vector. There are many, many cases where protocol and payload can disagree (e.g. Total Length field of IP headers). Furthermore, there are known mitigations to both "fragility" and "attack vector" issues. Is deprecating useful components of HTTP really the right solution? >> How is this an antipattern? It's very standard and very unambiguous. > > Something can be standard and unambiguous and yet still a bad idea. Between "Here's how you describe your content if you want." and "Sometimes people lie so don't bother telling the truth. Telling the truth is deprecated." It seems, to me, that the second is a bad idea. It is strictly worse than the present situation which at least gives publishers the facility to unambiguously indicate their intent. >>> The sniffing behaviour is a consequence of media types as an >>> architectural >>> construct, not an alternative to it. >> >> >> Sniffing is brittle and leads to breakage as included metadata >> regarding how to interpret the payload is ignored. >> >> The sniffing behaviour is a consequence of an attitude of Big Browser >> Knows Best regarding media types. > > No it's not. Sniffing is a direct consequence of authoritative metadata. No, it's not. Sniffing ability is a direct consequence of the optional nature of Content-type. General-purpose user agents have to have the ability to sniff because they may not be presented with ANY metadata. That this heuristic is also useful when publishers lie to you is not a reason to silently disregard the sender's intent, "sniff", and present to an ignorant user. Fundamental interpretation errors and subsequent heuristic correction should be surfaced. That this is NOT the present behavior indicates, to me, an attitude of Browser Knows Best. > It's certainly not something that's limited to browsers. I have written > plenty of tools over the years that ignore the media type just because you > have to: it's wrong. RSS shipped as a bewildering array of wrong media types > or JSON shipped (typically) as text/html are just the more prominent > examples. That's fine. Sometimes people lie. When we develop applications that speak HTTP, we must be aware that the other side may lie. As developers, we can specifically handle these cases for our application as we are in full control. The problem arises with the combination of "I, Browser, know better than BOTH my user and the transmitting party." and "Because I clearly know best, neither producer nor consumer should *attempt* to indicate any intent." Yes, there are problems with the present system. Your suggestion of "everyone should just plan for ambiguous sniffing and we should deprecate declarative intent" appears to be removing publisher choice without any proposed replacement. How does it then follow that the recommendation of the W3C should change from "don't lie" to "it doesn't matter if you lie or tell the truth"? > No one uses sniffing because they find it fun. Sniffing is there because > media types, as a technical and social construct, is inherently brittle. Once again, correcting incorrect media types is only one application of these heuristics. Synthesizing a media type is another application. Just because you have a function sniff : blob -> media_type doesn't mean that media_type is worthless and should be globally replaced by blob. By advocating for global replacement, you are advocating for the institutionalization of the ambiguous status quo instead of allowing multiple methods to co-exist with clear indication of which method is unambiguous. >> The alternative to this behaviour is respecting the media type as >> transmitted. > > And how does that help anyone when it's wrong? You can respect the media type as transmitted by telling the user that you have taken the liberty of interpreting the content in a way that was not indicated by the producer. This helps both the user and the publisher understand what has occurred. If the publisher is transmitting messages with extremely wrong types, this is quite suspicious activity and everyone involved should know. >> How is sniffing a consequence of following the protocol? > > Two primary aspects contribute to this: > > • The information essential to the processing of the payload is made to be > volatile, such that in many if not most cases it exists only during > transmission but not before or after. In some cases, it can in fact be > difficult to keep it (the typical case being maintaining content coding and > media type while storing in a regular file system). This volatility leads to > information loss and errors. When two pieces of information can easily go > out of synch, they will. I concur. HTTP semantics are a superset of filesystem semantics. This is a difficult tension to resolve but necessary due to HTTP's more general use cases. In situations like this it seems prudent to try to achieve harmony while maintaining semantics rather than cutting HTTP down to fit into a world of desktop file systems. At least two mitigations exist: 1. Browsers have access to persistence for metadata. 2. Saving payloads that would be sniffed as a type that they weren't interpreted as should trigger user notification. I believe some Operating Systems include warnings about changing file extensions. > • The cost of error is born by the receiver, not the sender. In any such > system you are guaranteed to see receivers perform error correction, and > those will dominate over time (simply by virtue of being better for their > users). That's fine. Error correction is very important. It doesn't follow that declaring intent should be deemed an antipattern. It certainly doesn't follow that error correction should be *assumed*. I believe this argument suffers from the fallacy of appeal to the market. >>> Further, I think that the TAG should take this occasion to issue a >>> recommendation to people building formats that they include format >>> identifying information as essential, typically with a magic number, >>> first >>> non-blank line, etc. >> >> What occasion would that be? > > The aforementioned revisiting of this issue. "On the occasion of my raising the issue, the issue should be settled." ? I add, here, that application/json does not guarantee dictionary ordering nor supply any higher-level namespace mechanism. How do you suggest JSON messages be transmitted in this New World? Specifier A uses application/a+json with a top-level dictionary with magic key, "a", and an open namespace. Specifier B uses application/b+json with a top-level dictionary with magic key, "b", and an open namespace. Should {"a": "1.0.0", "b": "1.0.0", "execute": "..."} be interpreted as application/a+json or application/b+json? >> Here's how to can tell you are receiving an HTTP message: >> <http://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html#sec6.1> > > Yeah, but transmission is just a small part of the data lifecycle. Yeah, but receipt from a remote publishing authority is the most salient indicator of intended interpretation. Instead of (unsuccessfully) convincing the world to adopt consistent and useful magic numbers for every content type, why not standardize this in a common protocol that carries any type of content? It doesn't have to be either/or. Regards, David
Received on Tuesday, 26 February 2013 04:15:27 UTC