Re: Content Sniffing impact on HTTPbis - #155 from Jamie Lokier on 2009-06-13 (ietf-http-wg@w3.org from April to June 2009)

From: Jamie Lokier <jamie@shareable.org>
Date: Sat, 13 Jun 2009 19:15:10 +0100
To: Adam Barth <w3c@adambarth.com>
Cc: David Morris <dwm@xpasc.com>, ietf-http-wg@w3.org
Message-ID: <20090613181510.GH16220@shareable.org>
Adam Barth wrote:
> On Sat, Jun 13, 2009 at 9:56 AM, Jamie Lokier<jamie@shareable.org> wrote:
> > Does the sniffing document not apply to browsers looking at content on
> > a local disk (therefore with no Content-Type), or does this mean it
> > recommends sniffing the content without looking at the filename on the
> > local disk?
> 
> I haven't investigated this question in detail, but I suspect the
> answer will vary by browser.  There is very little interoperability
> between browsers when interacting with the file system.
> 
> > I'm pretty sure Firefox and the like look at the file extension when
> > looking at content found on local disk.  But surely it does sniffing
> > at well, on local disk files?
> 
> Do you have evidence for this belief?  It should be fairly easy to
> determine by looking at the source code.

It's easy to determine by simply trying it.

I've just created a small file with this content (not indented):

    <html><head></head><body>
    Hello, I am <b>HTML</b>
    </body></html>

If it's called test.html, it will display as HTML.
If it's called test.txt, it will display as plain text.
==> If it's called test.foo, it will display as HTML.
==> If it's called just test (no extension), it will display as HTML.

But if we change the file slightly, putting a single character x in
front like this:

    x<html><head></head><body>
    Hello, I am <b>HTML</b>
    </body></html>

If it's called test.html, it will display as HTML.
If it's called test.txt, it will display as plain text.
==> If it's called test.foo, it will display as plain text.
==> If it's called just test (no extension), it will display as plain text.

Therefore Firefox (3.0.10) does sniff a local file to determine how to
display it, and the sniffing algorithm (or whether to apply it) _does_
depend on the file extension.

> > Does the sniffing document not apply at all in that case, or is there
> > a different sniffing algorithm used which remains undocumented?
> 
> There is only one sniffing algorithm.  The question is only whether
> its applied in this case.  More precisely, the question is whether the
> "file" protocol handler assigns a media type using OS-specific
> functionality before handing the response off to the next layer, where
> content sniffing is performed on various media types (e.g., the empty
> media type).

It's clear from trying it that Firefox applies a sniffing algorithm to
local files, and either it is influenced by the file's extension, or
decided whether to apply the algorithm at all depending on the
extension.

I don't know what it does with FTP, but I wouldn't be surprised if
it's the same as local files.

Now, let's get back to HTTP.  I've done the same test as above with
HTTP in the same Firefox.

If the Content-Type is text/plain or text/html, then Firefox honours
the Content-Type, independent of whether the content has "x" at the
start in these two test files.

If the Content-Type is application/octet-stream, then Firefox does
different things depending on the URL's file extension.  If it ends
with .html, Firefox shows an error dialog(!), otherwise it offers to open
the file in an application of your choice.

If the Content-Type is blank, because I couldn't persuade Apache to
omit it completely, then Firefox behaviour depends on the URL's file
extension.

    <html><head></head><body>
    Hello, I am <b>HTML</b>
    </body></html>

If it's called http://.../test.html, it will display as HTML.
If it's called http://.../test.txt, it will display as plain text.
If it's called http://.../test.foo, it will display as plain text.
If it's called http://.../test, it will display as plain text.

    x<html><head></head><body>
    Hello, I am <b>HTML</b>
    </body></html>

If it's called http://.../test.html, it will display as plain text.
If it's called http://.../test.txt, it will display as plain text.
If it's called http://.../test.foo, it will display as plain text.
If it's called http://.../test, it will display as plain text.

As you see, Firefox applies a similar sniffing test in these examples
to decide whether to treat the resource as HTML or plain text, and it
does use the URL's file extension in making it's decision.

However, it doesn't use quite the same algorithm as for local files,
as you can see from the .html and .foo extension differences.

In the bigger picture, my point is that sniffing is used in practice,
in a major browser, for local files as well as HTTP (and FTP but not
shown here), and the decision about _whether_ to use it (at least)
does depend on the file extension for HTTP as well as for local files.

It would be good to document and standardise when the sniffing
algorithm is applied, dependent on file/URL extensions, for the same
reason that it is good to document and standardise what the sniffing
algorithm is.

I don't know from these tests if the sniffing is simply switch on/off
depending on file extensions or if it is influenced in a more
fine-grained way.

-- Jamie
Received on Saturday, 13 June 2009 18:15:44 UTC