Re: Content Sniffing impact on HTTPbis - #155

On Fri, Jun 5, 2009 at 9:50 PM, Bjoern Hoehrmann <derhoermi@gmx.net> wrote:
> * Adam Barth wrote:
>>For better or for worse, this is the way browsers work.  I believe
>>this is related to historical behaviors of Apache and other HTTP
>>servers.  Removing these rules cause binary spew to fill up the
>>content area.
>
> What the draft should say why you have these and not others, in
> particular, not canonically equivalent values. Otherwise one might
> naively implement it like
>
>  if (response.content_type == 'text/plain')
>       if (response.charset == 'ISO-8859-1')
>       or (response.charset == 'iso-8859-1')
>       or (response.charset == 'utf-8')
>            ...
>
> Which would result in different behavior. Clearly pointing out why
> you have these and no others would make implementers think twice.

The spec text is clear that you should apply those steps only when the
octet sequence matches exactly what's in the table.  I've emphasized
that further in the new revision.

In general, I don't think it makes sense to explain the reasoning
behind every detail of the algorithm in the spec.  Implementations
should just do what the spec says.  The introduction alludes to the
scope of information considered in creating the spec.  If we explained
the reasoning behind every detail, the spec would easily be over 100
pages long.

> This is my point with respect to the other sections aswell. I'll
> just go through some examples:
>
>>[If /official type/ is an image type supported by the user agent]
>>
>>Would you prefer this step applied to all media types that begin with
>>"image/"?  That might let us remove step 6 as well.
>
> I just see that if I make a PNG image and deliver it as image/
> vnd.adobe.photoshop, then browsers with PSD support will treat it
> as PNG, and browsers without PSD support will treat it as PSD. I
> find that obscure and would like the reasoning documented in the
> draft.

That's correct.  The reason has to do with the historical behavior of
user agent, especially Internet Explorer and Firefox.  I don't think
it will add clarify to the specification to include the reasoning for
this detail.

>>> why step 3 only applies when you have at least three
>>> bytes and then only compares two bytes,
>>
>>We could change this to be slightly tighter, but it's a bit pedantic.
>
> When I read the text I suspect there is an error in the specifi-
> cation, and would then implement what I think my application ought
> to do; but your goal is that people implement it exactly as written.

If you're going to second-guess the requirements in the document, then
there's not much I can do to help you.  In this case, the impact of
altering this detail in miniscule.

>>> why the UTF-32 BOM is not being detected,
>>
>>We measured and determined that it was not needed for compatibility.
>>In general, we tried to minimize the amount of sniffing.
>
> As I read the draft, UTF-32LE encoded text/plain documents will be
> sniffed as text/plain because they have a UTF-16LE BOM; UTF-32BE
> encoded text/plain documents will be sniffed as application/octet-
> stream. This is inconsistent and confusing (there is suddenly some
> doubt whether you treat the document as UTF-16 or UTF-32, and while
> browsers might not support UTF-32, other applications will).

Looking for consistency in the sniffing algorithm is a fools errand.
The behavior described in the spec is the way life is.

>>> in section 6 the special handling of image/svg+xml;
>>
>>This is to above mistakenly changing the type of an image/svg+xml
>>resource that happens to begin with a magic number, e.g., BM.  Again,
>>this is a place where we are able to minimize the amount of sniffing.
>
> As I read the draft, the "official type" cannot be the SVG type at
> that point.

You are correct that the official-type cannot be image/svg+xml if you
enter the algorithm from the main entry point.  However, the
specification is designed to have it's various subsections invoked
independently by other specs.  In this case, the correction for
image/svg+xml in the Images section is correct.

> Again, the point is that people less familiar with the subject, and
> perhaps specifications in general, have access to this information;
> posting it here does not help them.

Indeed.  The subject of content sniffing is quite complicated.  The
draft says as much.

Adam

Received on Tuesday, 26 January 2010 20:19:02 UTC