RE: Feedback on Internet Media Types and the Web from Larry Masinter on 2010-11-08 (www-tag@w3.org from November 2010)

From: Larry Masinter <masinter@adobe.com>
Date: Mon, 8 Nov 2010 12:46:31 -0800
To: "julian.reschke@gmx.de" <julian.reschke@gmx.de>, Henri Sivonen <hsivonen@iki.fi>
CC: "www-tag@w3.org WG" <www-tag@w3.org>, Alexey Melnikov <alexey.melnikov@isode.com>
Message-ID: <C68CB012D9182D408CED7B884F441D4D0476C15AF4@nambxv01a.corp.adobe.com>
re comments on draft-masinter-mime-web-info-03
http://lists.w3.org/Archives/Public/www-tag/2010Nov/0047.html

(I suggested follow-up on www-tag@w3.org I suppose a pointer to the
discussion could go to apps-discuss and/or public-html?)


" The document doesn't recount how dysfunctional the MIME type registry has been. "


 I was hoping to be more explicit about the nature of the problems, as well as
about what the desired state is, rather than just saying it is "dysfunctional".
MIME has functioned very well for many applications and types; a few cases where
things have gone awry doesn't merit the term.

What were the problems with image/svg+xml, image/jp2 and/or video/mp4?

Should these be registered even if the requirements for MIME type registration
weren't met? Or did they meet the requirements but the process dropped the ball?


As for image/svg+xml not being used for 'XML' format. I think this is a 3023bis issue?
Is this different from mime-for-EMail vs mime-for-Web?

===================
> section "3.1.  Differences between email and web delivery" 
> doesn't elaborate on the CRLF issue.


As for restrictions on text/* types: for things like signatures for text types,
it did seem useful to maintain that the "canonical form" of a text/* type used
CRLF, but that HTTP allowed transport of non-canonical forms, even when email
didn't.

At least that's how we decided to "cut the knot" when we looked at this problem
in the HTTP working group many years ago. Is this not a feasible direction? Does
the restriction on using CRLF need to be removed in other contexts too?

================
> Section "4.1.  There are related problems with charsets" doesn't 
> sufficiently rebuke the IETF for the supposed US-ASCII default
> for text/* types.

It would be foolish to "rebuke" the IETF for a restriction that made perfect
sense at the time it was made, and for which there is no compelling case made
for introducing a backward incompatibility.

I don't see the reason *not* to use application/* for nearly everything.
The MIME top level types have really pretty limited utility. Why is it important
that people text/* types instead? If the text/ types you're defining don't
really match the requirements set out for text/ ... why not just use application/?
=============================

> The document doesn't sufficiently acknowledge that for most binary file formats 
> (particularly image files), the "magic number" of the file format is a much more
> reliable indicator of the format than an out-of-band MIME type,

First: I'm not sure this is true. I know there are circumstances where the
content-type label is wrong and sniffing gives the right answer, but there
are also circumstances where the label is right and sniffing gives the
wrong answer. So which is more prevalent, really? Do we have more data
than scattered anecdotes?

Secondly, I'm not convinced that even if it is true now that the right thing 
to do is to give up on trying to get explicit MIME type indicators to work. 

> " an architecture that insists on using out-of-band type data and on the
> out-of-band type data being authoritative has largely been unproductive"

in what way has it been "unproductive"? 

logically: those who are closer to the source of the data are more likely
to know authoritatively about the nature of the data than those who are
further down the pipeline. 

I know this topic has been discussed at length under the "authoritative metadata"
TAG finding. Doesn't it depend at least a bit on whether the MIME labels
are being applied by HTTP servers (e.g., Apache / IIS) vs. email clients
sending attachments?

Part of the problem is that sniffing is implemented inconsistently, and
when different parties do it differently, there are security and reliability
problems. But there's little hope of getting convergence on any position
*other* than "content-type is authoritative", when looking at the broader
range of Internet applications.


===============

> Section "4.5.  Content Negotiation" doesn't properly acknowledge that
> content negotiation on axes other than lossless compression (gzip) is
> mostly a failure on the Web.

But "user-agent" content negotiation is widespread, common, 
and quite functional. And certainly there are circumstances where
"Accept:" headers are used and are useful.... otherwise people wouldn't
send them, right?

So "mostly a failure" seems to be unsubstantiated.

> Negotiating the file format e.g. HTML vs. Word vs. PDF doesn't
> really happen. People want to make an explicit choice of downloading 
>an MS Office or PDF depending on the goals they have that moment
>  instead of letting software pick a format for them. 

This topic was explored in HTTPbis and I wrote something about it for the
HTTP protocol update... I'll add a reference.

> Negotiation of HTML vs. XHTML happens but is rare in the
>  big picture and rarely offers true value to users.


I'm not sure "rarely offers true value" is a motivation for
making changes, since the "true value" we're looking for is more
indirect: it is an advantage for the web to be consistent and
reliable. Consistency and reliability are not features that
can be evaluated on a bit-by-bit basis, where one kind of content
negotiation which adds "true value" is allowed and another
kind isn't. We need to come to a general agreement about authoritative
metadata and content negotiation, and get it implemented consistently
in browsers and web servers and email and news senders, clients,
operating systems, imap, instant messaging. Making decisions
on a case-by-case basis may optimize some things in the small,
but interfere with overall consistency.

Larry
--
http://larry.masinter.net
Received on Monday, 8 November 2010 20:47:05 UTC