Re: "Authoritative Metadata" standard, default mime types and XSS from Roy T. Fielding on 2019-01-11 (www-tag@w3.org from January 2019)

From: Roy T. Fielding <fielding@gbiv.com>
Date: Fri, 11 Jan 2019 13:22:59 -0800
To: Hanno Böck <hanno@hboeck.de>
Cc: W3C TAG <www-tag@w3.org>
Message-Id: <A678221F-FC8B-418C-B8D2-2E59B8E0BBCE@gbiv.com>
> On Jan 10, 2019, at 11:38 PM, Hanno Böck <hanno@hboeck.de> wrote:
> 
> Hello,
> 
> I recently looked into a security issue that happens relatively easily
> in the way web severs and web applications are designed, I have
> multiple practical instances in popular web applications (Wordpress,
> Joomla, Mailman).
> 
> Now one way to avoid this issue would be to let web servers send a
> default content-type / mimetype. However the W3C Authoritative Metadata
> standard, explicitly says this shall not happen.

No, on both counts. First, sending a default type doesn't avoid the issue;
it just encourages browsers to sniff whenever it sees that default type
(even if it wasn't set by default). We get silly things like

   https://mimesniff.spec.whatwg.org/#ref-for-check-for-apache-bug-flag

Defaults were abandoned specifically because it results in this security
issue becoming worse, not better.

Second, the authoritative metadata finding is not a standard -- it is a finding --
and it has nothing to do with a specific server owner's decision to apply a
default type to a specific set of resources. That's just configuration.
It is about the defaults chosen by server software developers (like me).

> The problem is like this:
> * A web application allows uploading any kind of "unusual" file type
>  that is not part of the server's mime.types. (The mime.types is in no
>  way standardized and differs significantly between distros, so there
>  can be practically no expectaiton on what that exactly means.) Let's
>  use a fictional file format .aaa as an example.

mime.types is not the only way to set media types on the server.

> * The web server will either guess the content type on its own or send
>  it without a content type and then the browser will guess the content
>  type. Both are bad. (Some web browsers - notably Edge+Firefox will
>  even guess the content when the "X-Content-Type-Options: nosniff"
>  header is sent, because that originally was only designed for .js
>  and .css files and thus won't prevent HTML sniffing.)
> * An attacker can upload a file example.aaa that contains html code and
>  javascript.
> * Calling that file will execute the javascript - you have an XSS.

I believe that browser good practice should be to disable such scripting
whenever the browser sniffs a type. Some people disagree. In any case,
that is a self-inflicted wound.

> This is a tricky to avoid issues. A web application like wordpress can
> hardly do anything about it (except maybe not allowing uploads of any
> "unusual" file types, but as written above, this is hard to define).

It can do anything it wants about it, including setting a default type for uploads
that prevents script embedding from being executed.  A web application is not
a general-purpose server.

> Now the safest way for a server to prevent this would be:
> a) don't guess content types.
> b) send a "safe" content type (e.g. application/octet-stream) for each
> file that has no extension that can be assigned via mime.types.
> 
> Now the Authoritative Metadata standard says this SHOULD NOT happen:
> "Good Practice
> Server software designers (implementers) SHOULD NOT specify default
> representation metadata, such as media type, character encoding, or
> content language, within the standard configuration shipped with the
> server.

Apache httpd (or similar) != Wordpress (or similar)

> Instead of specifying a default for metadata, it is better for
> representations to be sent without that metadata. That allows the
> recipient to guess the metadata instead of being forced to either
> accept incorrect metadata or be tempted to violate Web architecture by
> ignoring it."
> 
> In Apache this even led to complete removal of that option, going even
> a step further (not just not doing this by default, but actively
> removing any way for users to do this).

No, we removed the functionality of the old DefaultType configuration because
we can't change configuration files on updates.  A user can easily set the type
by default per location using ForceType:

  https://httpd.apache.org/docs/2.4/mod/core.html#forcetype

> Contrary to that Nginx sends a
> default content type.

Nginx does not comply with HTTP/1.1, let alone TAG findings.

> 
> I don't see any good justification for that option. It says "That
> allows the recipient to guess the metadata instead of being forced to
> either accept incorrect metadata or be tempted to violate Web
> architecture by ignoring it." But that's not a good thing: It's a
> security risk.

No, the security risk is executing untyped content.  It's a well-known risk
and is supposed to be addressed by the mimesniff spec.

> The whole document doesn't mention XSS or Cross Site Scripting, so I
> wonder if this has been considered in any way. I'm writing you to
> hopefully better understand why that decision was made and what the
> reasons were. As far as I see it there's a security flaw and one of the
> most obvious and likely robust fixes is forbidden by a standard without
> any good justification. I think the standard should be changed.
> 
> [1] https://www.w3.org/2001/tag/doc/mime-respect#reducing-inconsistency

Yes, it was considered, as were all of the alternatives.  The security risk is in
sniffing, not in failing to set a default type.  Setting a default doesn't change that
unless the user agent doesn't believe it is a default.

Cheers,

....Roy
Received on Friday, 11 January 2019 21:23:30 UTC