Re: Revisiting Authoritative Metadata (was: The failure of Appendix C as a transition technique)

On Mon, Mar 4, 2013 at 2:38 PM, Noah Mendelsohn <> wrote:
> * For certain families of content such as the ones you discuss, agreement
> can be reached on disjoint in-band markers, typically at the start of the
> streams, that make the format self-identifying within the family.

Which family? Text?

> * Even for these formats, it's appropriate to have an authoritative
> Content-type >identifying the family<. XML is a good example of this:
> application/xml identifies the family; you can determine from the root node
> to figure out the particular XML document type (application/blah+xml is
> possible but optional).

I think we should try to come up with an alternative example. Web
developers are only marginally concerned with XML. Apart from RSS/Atom
which are now largely defunct it never gained much traction.

Maybe fonts is a good example. Fonts never had a media type and never
needed one apparently. Some are introduced now, but fonts will forever
remain sniffed. I made some effort to push font/{subtypes} through,
but it was too much work. Maybe we should have pushed for yet more
application/{subtypes} bogosity instead and that would've been easier,
but I think the current situation is a lot easier for everyone
involved. User agents just pipe a response from a @font-face fetch to
the font decoder and web developers just upload fonts with whatever
server configuration they desire.

> * For other sorts of data formats, such in-band marking is either impossible
> or a bad tradeoff. Few of us would wish to put at the start of each of our C
> source files "CPRG" or some such, and it would be incoherent to do it in
> comma separated variables (CSV), etc. This is not just a legacy issue. There
> are formats for which in-band markers are a bad tradeoff.

These are also formats we largely transport as blobs over the wire and
don't really care about how web user agents interpret them so I think
this is a separate category.

> * For these formats Content-type is >necessary< to reliably convey the
> intended interpretation.

Not convinced. I'm sure CVS was in wide use before 2005 yet until then
it did not have an official type.

> * Postel's Law is to be conservative in what you send, as well as liberal in
> what you consume. Sending a jpeg as text/plain is a bug. Period. Rendering
> it as an image in the browser may be the lesser of the evils given
> widespread buggy content, but doing so should be viewed as an accommodation.
> Given that sniffing is to be done, I have no problem with the efforts of the
> HTML5 community to standardize the rules.

I used to believe this, and in fact configured my server to serve the
/favicon.ico resource as PNG (because that's what it was), but I gave
up on that as it's just make work and will work fine anyway even if
the server thinks it's serving image/x-icon (or whatever it defaults
to today).

> I. Where practical, it may be desirable to coordinate disjoint in-band
> format labeling across a wide range of content. However, we should not
> assume this will always be practical, or that different "families" may not
> have conflicting uses of the same markers.
> II. Content-type should remain authoritative, and should be used as
> described above to signal the correct interpretation of content. In cases
> where families of content share disjoint format markers in-band, the
> Content-type can identify the family or the particular format.
> III. Serving content with an incorrect Content-type should be viewed as a
> significant violation of the specifications of the Web. Where the type is
> not known to the server, Content-type should not be specified.
> IV. Interpretation of content in a manner contrary to the authoritative
> Content-type should be avoided where possible. When necessary to accommodate
> legacy content, as is the case with text/plain today, such "sniffing" should
> be viewed as an ugly work-around to meet practical needs. To the extent
> practical, we should move away from such usage.
> I therefore strongly disagree with Robin's proposal, which is to deprecate
> the notion of authoritative Content-types. I have no problem endorsing (I.),
> which I think is in the spirit of where he wants to go.

I think Robin is much more acutely aware of how the web platform
actually works and has evolved since its initial deployment.

> BTW: I wonder whether the time has come for a text/yes-its-really-plain-text
> media type?

That will only lead to text/i-fricking-mean-it-this-time-plain-text
(as we told Microosoft when they came up with their no sniffing header
that does in fact not remove all "sniffing").


Received on Monday, 4 March 2013 17:22:53 UTC