[whatwg] MIME Sniffing spec - http://mimesniff.spec.whatwg.org/ from Boris Zbarsky on 2011-10-22 (public-whatwg-archive@w3.org from October 2011)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Sat, 22 Oct 2011 10:53:11 -0400
Message-ID: <4EA2D8D7.6000201@mit.edu>

On 10/22/11 6:09 AM, Daniel Glazman wrote:
>>> text/plain; charset=iso-8859-1
>>>
>>> This is wrong. Nothing in the MIME or the HTTP specs says such a
>>> whitespace is mandatory. Whitespace is explicitely forbidden between
>>> type and subtype, between parameter-name and parameter-value, but that's
>>> all. AFAIC, |text/plain;charset=iso-8859-1| is perfectly valid and
>>> |text/plain ; charset=iso-8859-1| is perfectly valid too.
>>
>> We do not want to sniff text/plain more than strictly necessary.
>
> Sorry, I don't understand that answer, what do you mean exactly ?

Normally, when a browser receives a header of the form "text/plain ...." 
where ... is anything, it should treat the page as text-plain.

However, there is a known bug in old Apache installations where Apache 
defaulted to sending a type of "text/plain" or "text/plain; 
charset=iso-8859-1" or "text/plain; charset=ISO-8859-1" or "text/plain; 
charset=UTF8" (depending on the installation) any time it didn't know 
what type of data the file was.

Therefore, it is fairly common for random binary files to be served with 
those 4 exact header values.  Thus, if those _exact_ strings are 
encountered the UA needs to sniff to make sure it's not actually binary.

> If I read the document correctly, UAs are going to fallback to complex
> type detection with perf and time cost just because the content-type
> detection did not honour the potential presence of whitespace ???
> Really ?

You read it wrong.  If the whitespace doesn't match the exact values in 
the table, the UA will just treat the page as text/plain.  It's only 
when the header value is exactly one of the 4 in the table that the UA 
will go into http://mimesniff.spec.whatwg.org/#text-or-binary

-Boris

Received on Saturday, 22 October 2011 07:53:11 UTC