W3C home > Mailing lists > Public > www-tag@w3.org > March 2013

Re: Revisiting Authoritative Metadata (was: The failure of Appendix C as a transition technique)

From: David Sheets <kosmo.zb@gmail.com>
Date: Fri, 1 Mar 2013 18:31:59 -0800
Message-ID: <CAAWM5Tz==NXo6MT9_Dmh2ew=BpMCSFPnd9+ZpMi9GzMCyag3_Q@mail.gmail.com>
To: Robin Berjon <robin@w3.org>
Cc: "www-tag@w3.org List" <www-tag@w3.org>
On Tue, Feb 26, 2013 at 3:27 AM, Robin Berjon <robin@w3.org> wrote:
> On 26/02/2013 05:14 , David Sheets wrote:
>>
>> On Mon, Feb 25, 2013 at 4:25 AM, Robin Berjon <robin@w3.org> wrote:
>>> What one cares about transmitting most of the time is the payload, not
>>> the message.
>>
>> The message is the standard way to deliver the payload. If you just
>> want to send the payload, use netcat. Or use HTTP without a
>> Content-type header.
>
> "If you want foo, use bar" is hardly a solution to the real world problem
> we're looking at.

Your suggested solution of "If you want unambiguity, use magic
numbers" suffers from all the same problems and *more* as it's an
explicit subset of the present system. A perfectly fine solution to
the "real world problem" is to suggest that publishers (and their
server devs) refrain from publishing Content-type when they are not
certain of its veracity.

> Iff you use the protocol properly and iff you only speak to people who do so
> as well then you're right: everything is fine and dandy. But that's not an
> accurate description of reality.

Perfect global compliance is *not* a requirement either in the
specifications or my argument. Nowhere do I claim that everyone
complies perfectly and this fact has no relevance on what the
published position of the W3C should be.

>>>> How is this an antipattern? It's very standard and very unambiguous.
>>>
>>> Something can be standard and unambiguous and yet still a bad idea.
>>
>> Between
>>
>> "Here's how you describe your content if you want."
>>
>> and
>>
>> "Sometimes people lie so don't bother telling the truth. Telling the
>> truth is deprecated."
>>
>> It seems, to me, that the second is a bad idea.
>
> Possibly, but then the dichotomy you describe above is a complete strawman
> and doesn't in the least relate to what I'm proposing.

Really? You said "I would support the TAG revisiting the topic of
Authoritative Metadata, but with a view on pointing out that it is an
architectural antipattern."

If it is an architectural antipattern, then it should not be relied on
in future and new formats should not consider it as it is deprecated.
Future users should be advised against it because it is "brittle and
leads to breakage."

You claim this is the case because "media types, as a technical and
social construct, [are] inherently brittle."

This sounds a lot like "Sometimes people lie (about their media types)
so don't bother declaring your intended interpretation (it's
'brittle') because consumers won't heed your wishes." The alternative
is what we have now: sometimes people lie but those who wish to
participate in good-faith have an unambiguous signal to do so. If
people lie, you should distrust their authority and use other signals.
This is the protocol specification stating "Here's how you describe
your content if you want."

>From this, I can only conclude that my assertion is neither a
"complete strawman" nor "doesn't in the least relate" to what you are
proposing. From the vigor of your denial, I am inclined to believe
that reducing your proposal to simple, straightforward terms can be
quite clarifying.

Please, suggest an alternative short phrasing of the for/against in this debate.

> What I'm stating is:
>
>  Splitting away information that is required for integrity serves no
> identified purpose and rather obviously introduces a weakness into the
> architecture; and

You have shown neither:

1. That HTTP Content-type as primary authoritative media typing signal
"serves no identified purpose"
2. That HTTP Content-type "obviously introduces a weakness into the
architecture"

To the contrary, evidence has been presented that demonstrates:

1. HTTP Content-type has useful purposes unmatched by alternative proposals
2. HTTP Content-type reduces ambiguity and provides a single,
consistent signaling mechanism

Your proposed alternative of using magic numbers for everything suffers from:

1. Impracticality in the face of major deployed serializations
2. An overly prescriptive, byzantine, centralized specification that
isn't flexible and doesn't scale because you must specify frequently
updated global media type precedence and fingerprints and everyone
must comply or suffer nonsense results.

Given these facts and the fact that we already have a global system
which uses BOTH Content-type and magic numbers, it would appear that
your proposal "introduces a weakness into the architecture" by
discouraging one simple signal in favor of a complex and varying
heuristic.

>  Our documentation of the architecture is lying about the state of the
> world, let's make it tell the truth.

Where does it lie about the state of the world? In fact, where does it
mention state at all?

>> That this heuristic is also useful when publishers lie to you is not a
>> reason to silently disregard the sender's intent, "sniff", and present
>> to an ignorant user. Fundamental interpretation errors and subsequent
>> heuristic correction should be surfaced. That this is NOT the present
>> behavior indicates, to me, an attitude of Browser Knows Best.
>
> Well, as shocking as it may sound users seem to prefer tools that produce
> the results they want (accessing the content) over tools that condescend to
> them as "ignorant users".

I'm glad that you agree that the present browser behavior is
condescending due to its enforcement of end-user ignorance.

Nothing about Authoritative Metadata prevents UAs from producing the
results that end-users want.

UAs are even free to be condescending and keep their users ignorant.
As you note, there is no Web Police.

>> Yes, there are problems with the present system. Your suggestion of
>> "everyone should just plan for ambiguous sniffing and we should
>> deprecate declarative intent" appears to be removing publisher choice
>> without any proposed replacement.
>
> Give me a use case for sender intent. Then we can look at the merits of a
> technical solution that implements it.

Suppose I offer resources that sometimes use extensions to JSON
<http://mjambon.com/yojson.html>.

How do I indicate to you that the served representations aren't JSON?
Even when they happen to be valid JSON?

In general, how do I indicate that some "payload" looks astonishingly
like another format but isn't?

Give me a use case for a "standardized" sniffing system. Then we can
look at the merits of a technical ecosystem that attempts to implement
it. I believe the onus is on you to justify architectural deprecation.

> http://w3cmemes.tumblr.com/post/34633601085/grumpy-old-maciej-has-a-question-about-your-spec

Rly? "What are the use cases for HTTP? Justify it to me!"

>>>  The cost of error is born by the receiver, not the sender. In any such
>>> system you are guaranteed to see receivers perform error correction, and
>>> those will dominate over time (simply by virtue of being better for their
>>> users).
>>
>> That's fine. Error correction is very important.
>>
>> It doesn't follow that declaring intent should be deemed an
>> antipattern.
>
> The antipattern isn't declaring intent. It's declaring any information
> required for the processing of a payload separately from that payload, *and*
> requiring it to be authoritative. Down that path lies breakage.

>From the server's perspective, the information required for processing
is *in the message*. The *message* is the network payload.

Take a look at new security-focused HTTP headers
(Access-Control-Allow-Origin, X-Frame-Options, etc). Were these
mistakes? Should they be ignored?

If you remove authority from senders and consistently design as if
your users are stupid, the UA developer is the ultimate and only
authority. Down that path lies centralized control, corrupted
standards, and a feeble World Wide Web ecosystem.

>>>> What occasion would that be?
>>>
>>> The aforementioned revisiting of this issue.
>>
>> "On the occasion of my raising the issue, the issue should be settled." ?
>
> Well, it's not the first time and I'm not the first person to point out that
> Authoritative Metadata is an anti-finding. Note that I'm not the one who
> brought up the notion of revisiting it, I simply took the opportunity to
> remind this community that we have a bug there.

And yet you still fail to provide an argument based on normative principles.

You know that the present system will continue on despite whatever W3C
publishes. What fundamental principles do you think that W3C should
adopt that justify deprecating Authoritative Metadata? Do you believe
that deprecation is already justifiable? Why?

>> I add, here, that application/json does not guarantee dictionary
>> ordering nor supply any higher-level namespace mechanism. How do you
>> suggest JSON messages be transmitted in this New World?
>
> As I've now said about twenty times in this thread, I'm not proposing that
> we remove content types, simply because we can't (if we could, I would).
> Further mentions of this strawman won't get a response.

Ironically, you failed to quote me in context and thus erroneously
believe I have asserted a "strawman" when you, yourself, have
committed the strawman fallacy. Here is the rest of my, still
unaddressed and unresolved, problem:

Specifier A uses application/a+json with a top-level dictionary with magic key,
"a", and an open namespace. Specifier B uses application/b+json with a
top-level dictionary with magic key, "b", and an open namespace.
Should {"a": "1.0.0", "b": "1.0.0", "execute": "..."} be interpreted
as application/a+json or application/b+json?

Three things strike me about your response:

1. You casually indicate that you think Content-Type was a mistake and
the architecture would be better without it if you were in charge.
2. When presented with a legitimate question, you remove it, dismiss
it, and declare that you will not further address it.
3. You continue to erroneously assert elsewhere that "no use cases
exist" despite being presented with many.

This behavior troubles me a great deal. Demonstrate rationalism.

> We can't fix (most of) the data formats we have that don't include built-in
> typing. What we *can* fix is our advice that that's actually a feature, and
> rather recommend that future file formats do not repeat this mistake. It's
> too late to save JSON, at least until such a time as a v2 happens, if it
> ever does. If it *does* happen, you can bet that a builtin indicator will be
> at the top of my list.

Except you don't control the world to an extent where your advice of
"include built-in typing" will be heeded. JSON is extremely widespread
and new formats appear constantly with similar lack of
self-description (TOML? YAML?).

Just as there is nothing to be done about Content-type abuse, there is
nothing to be done about new formats lacking magic numbers or other
trivial identification heuristics. Of *course* both are bad ideas and
should be discouraged.

What now? Should we tell people who fix their future problems with
Content-Type that they're doing it wrong?

>> Yeah, but receipt from a remote publishing authority is the most
>> salient indicator of intended interpretation. Instead of
>> (unsuccessfully) convincing the world to adopt consistent and useful
>> magic numbers for every content type, why not standardize this in a
>> common protocol that carries any type of content?
>
> Because we did that and failed?

I wasn't aware HTTP was a failure.

Or perhaps you mean Content-type was a failure? Nope, Content-type is
a useful signal of publisher intent. Even though it's sometimes wrong.

Or perhaps you mean that not everyone follows the protocol? Hmm...
happens a lot. More and more with more success in fact. Make failure
the standard?

Or perhaps you mean the standardization itself failed? Many
interoperable implementations beg to differ.

So, what do you mean? What, specifically, failed?

Also, what, specifically, are you proposing? What would be your ideal
outcome from this discussion?

Sincerely,

David
Received on Saturday, 2 March 2013 02:32:28 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Saturday, 2 March 2013 02:32:29 GMT