Re: Feedback on Internet Media Types and the Web from Eric J. Bowman on 2010-11-09 (www-tag@w3.org from November 2010)

From: Eric J. Bowman <eric@bisonsystems.net>
Date: Tue, 9 Nov 2010 13:39:55 -0700
To: Henri Sivonen <hsivonen@iki.fi>
Cc: Larry Masinter <masinter@adobe.com>, "julian.reschke@gmx.de" <julian.reschke@gmx.de>, "www-tag@w3.org WG" <www-tag@w3.org>, Alexey Melnikov <alexey.melnikov@isode.com>
Message-Id: <20101109133955.79ed0b06.eric@bisonsystems.net>
Henri Sivonen wrote:
> 
> The problem with image/svg+xml is that after a decade of deployment
> and W3C REC status, the type still isn't in the registry. Even if the
> IETF experts found something wrong with the type, it would be way too
> late to stop its deployment, so there's really no point in subjecting
> it to expert review at this point.
> 

The same situation exists with application/rss+xml, which also defines
multiple, incompatible processing models.  But that's exactly the sort
of situation the IANA registry is meant to avoid, in the standards tree
anyway.  I believe the standards tree serves a valuable purpose, and
that it would be a bad thing to let anarchy reign by stating that it
doesn't matter whether the rules for media types are followed, g'head
and deploy whatever, and let popular consent override technical
concerns.

> 
> Yet another failure of the registry is that text/xsl isn't registered
> for XSLT.
> 

In none of these cases do I blame the registry for such failures -- the
rules for registration, and the definition of media types, had been
around long enough that the failure lies with the WGs who ignored them.

>
> > Should these be registered even if the requirements for MIME type
> > registration weren't met? Or did they meet the requirements but the
> > process dropped the ball?
> 
> I don't know what exactly has happened with the registration for each
> of these types. I'm just observing that the outcome was that the
> system didn't work in the sense that the registry wasn't the place
> where a Web author, a Web server administrator or a Web client
> software developer could go and find what the right MIME type for a
> given format is.
> 

I go to the registry to find media types that have been vetted by
experts and known to meet some basic requirements.  If the registry
becomes a list of everything anyone wants to do, whether it meets those
requirements or not, well, I'd consider that a failure of the registry.
The rules are quite clear -- pending approval, prefix with x. i.e.
image/x.svg+xml or application/x.rss+xml.  Refusing to follow the
appropriate syntax *and* ignoring what media types are supposed to do,
shouldn't be rewarded with registration -- the IANA registry isn't
perfect, but destroying its credibility such that nobody has any faith
in any media types, sounds counterproductive to me, and would be a much
larger problem than the handful of strings out there which *look* like
standards-tree media types, but aren't.

>
> > As for image/svg+xml not being used for 'XML' format. I think this
> > is a 3023bis issue?
> 
> Do you mean sending gzipped data as image/svg+xml without
> Content-Encoding: gzip?
> 

RFC 3023(bis) say nothing about ZIP files.  The media type is supposed
to tell me the sender's intent, so I know how to process the payload.
I don't know how anyone expects feeding a ZIP file (because this is an
issue of pre-compressing the file, not Content-Encoding compressing it
on-the-fly) into an XML parser to work, but that's exactly the intent
being conveyed.  Unless the intent being conveyed is SVG and the file
isn't compressed (or is compressed on-the-fly).  The registry is correct
in insisting that a media type identify one, and only one, processing
model.  Otherwise, intermediaries have to introspect the payload to
determine whether it's ZIP or XML -- defeating the entire point of
exposing this *outside* the payload, in a header.

Has this blatantly-obvious mistake really gone uncorrected for a decade?
Is the remedy proposed by the experts (registering two media types)
outrageous and non-implementable?  The failure here is *not* the
registry.  Having recently concluded a year-long crusade on rest-discuss
advocating proper use of media types, I'm aware of the problems with
the registry and the registration process, but this is not one of them.

There's a Simpsons analogy here -- remember Homer teaching Bart how to
putt?  "Keep your head down... follow through..."  Bart misses the putt,
so Homer sets him up again.  "OK, that didn't work, so this time, lift
your head and don't follow through!"  The obstinate refusal by some to
adhere to the fundamentals of the architecture, is not a valid reason
to abandon those fundamentals and start registering anything seen in
the wild in a syntactically-identical fashion to those types which did
follow the rules and have been vetted.  I *like* being able to tell the
difference between the two.  That some folks yip their putts, is not a
reason to discard the fundamentals of putting for everyone.

> 
> It seems rather implausible that there'd be more files that
> accidentally have the magic number for an image file format, a video
> file format, zip, gzip or PDF than there are mislabeled files in
> these formats, but I don't have data based on Web crawls followed by
> manual inspection. It's well known, though, that browsers, in order
> to be Web-compatible, ignore the image subtype for binary formats and
> sniff the magic number instead.
> 

As I understand it, the problem with requiring magic-number sniffing to
identify the sender intent, is that it doesn't work at wire-speed for
intermediaries.

>
> > Secondly, I'm not convinced that even if it is true now that the
> > right thing to do is to give up on trying to get explicit MIME type
> > indicators to work. 
> 
> I agree that it's now too late to give up on MIME entirely, since we
> now have types that don't have reliable magic numbers (in particular
> HTML, XML, CSS and JavaScript). However, if the purpose of the
> document is to document what went wrong or what could have gone
> better, I think specifying magic numbers as the step forward from
> HTTP 0.9 so that textual types would have been forced to have
> reliable magic numbers could have lead to a more robust outcome than
> the one we got.
> 

More robust, perhaps, but less scalable.  I don't think the document
should speculate that another solution would have been better, because
we simply can't know that's the case.

>
> >> " an architecture that insists on using out-of-band type data and
> >> on the out-of-band type data being authoritative has largely been
> >> unproductive"
> > 
> > in what way has it been "unproductive"? 
> 
> All the time wasted due to MIME labeling failures could have been
> avoided when formats have reliable magic numbers.
> 

Resulting in a different architecture, with unknown problems (we can't
know since that wasn't what was deployed), the solutions to which may
or may not have wasted even more time -- and perhaps led us to adopt
media types as the way forward.  There's no way to know.

> 
> >> Section "4.5.  Content Negotiation" doesn't properly acknowledge
> >> that content negotiation on axes other than lossless compression
> >> (gzip) is mostly a failure on the Web.
> > 
> > But "user-agent" content negotiation is widespread, common, 
> > and quite functional.
> 
> "Negotiating" based on the User-Agent header isn't part of the
> Accept* content negotiation design. As for it being functional, I
> think it's dangerous for the adoption of standards. To give an
> example that touches on what I've been working on lately, right now a
> practice of sites sniffing Firefox and Opera and assuming certain
> script execution behavior is threatening the convergence of all
> implementations on one standardized behavior.
> 

My experience over twelve years of implementing conneg, is that there
will never be convergence on one standardized behavior.  Eliminating
conneg may or may not result in such convergence.  If it doesn't, then
there's no mechanism to account for the lack of convergence -- resulting
in a failure with greater consequences than result from a lack of uptake
for conneg (beyond compression).

> 
> Also note that the Accept header of IE8 doesn't really allow
> negotiation on any other practical axis except progressive JPEG vs.
> not progressive, which no one cares about anymore.
> 

Sure it does.  My client-side XSLT implementation *could* just send
application/xml, except that my intent is best described by application/
xhtml+xml, which is what I send -- except for IE, to which I must send
application/xml.  How do I detect IE?  By looking for application/x-
microsoft in the Accept header.  Granted, that usage isn't what anyone
expects, but it works for me.  But mostly, I see conneg used by systems
which aren't meant for consumption by browsers, so I don't think it's a
broken mechanism.

-Eric
Received on Tuesday, 9 November 2010 20:40:18 UTC