Re: Content Sniffing impact on HTTPbis - #155 from Bjoern Hoehrmann on 2009-06-05 (ietf-http-wg@w3.org from April to June 2009)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Fri, 05 Jun 2009 23:50:35 +0200
To: Adam Barth <w3c@adambarth.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <po1j25debhvqun37kdadebhtvhhgd0274j@hive.bjoern.hoehrmann.de>

* Adam Barth wrote:
>For better or for worse, this is the way browsers work.  I believe
>this is related to historical behaviors of Apache and other HTTP
>servers.  Removing these rules cause binary spew to fill up the
>content area.

What the draft should say why you have these and not others, in
particular, not canonically equivalent values. Otherwise one might
naively implement it like

  if (response.content_type == 'text/plain') 
       if (response.charset == 'ISO-8859-1')
       or (response.charset == 'iso-8859-1')
       or (response.charset == 'utf-8')
            ...

Which would result in different behavior. Clearly pointing out why
you have these and no others would make implementers think twice.
This is my point with respect to the other sections aswell. I'll
just go through some examples:

>[If /official type/ is an image type supported by the user agent]
>
>Would you prefer this step applied to all media types that begin with
>"image/"?  That might let us remove step 6 as well.

I just see that if I make a PNG image and deliver it as image/
vnd.adobe.photoshop, then browsers with PSD support will treat it
as PNG, and browsers without PSD support will treat it as PSD. I
find that obscure and would like the reasoning documented in the
draft.

>> why step 3 only applies when you have at least three
>> bytes and then only compares two bytes,
>
>We could change this to be slightly tighter, but it's a bit pedantic.

When I read the text I suspect there is an error in the specifi-
cation, and would then implement what I think my application ought
to do; but your goal is that people implement it exactly as written.

>> why the UTF-32 BOM is not being detected,
>
>We measured and determined that it was not needed for compatibility.
>In general, we tried to minimize the amount of sniffing.

As I read the draft, UTF-32LE encoded text/plain documents will be
sniffed as text/plain because they have a UTF-16LE BOM; UTF-32BE
encoded text/plain documents will be sniffed as application/octet-
stream. This is inconsistent and confusing (there is suddenly some
doubt whether you treat the document as UTF-16 or UTF-32, and while
browsers might not support UTF-32, other applications will).

>> in section 6 the special handling of image/svg+xml;
>
>This is to above mistakenly changing the type of an image/svg+xml
>resource that happens to begin with a magic number, e.g., BM.  Again,
>this is a place where we are able to minimize the amount of sniffing.

As I read the draft, the "official type" cannot be the SVG type at
that point.

Again, the point is that people less familiar with the subject, and
perhaps specifications in general, have access to this information;
posting it here does not help them.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Received on Friday, 5 June 2009 21:51:11 UTC