"Authoritative Metadata" standard, default mime types and XSS

Hello,

I recently looked into a security issue that happens relatively easily
in the way web severs and web applications are designed, I have
multiple practical instances in popular web applications (Wordpress,
Joomla, Mailman).

Now one way to avoid this issue would be to let web servers send a
default content-type / mimetype. However the W3C Authoritative Metadata
standard, explicitly says this shall not happen.

The problem is like this:
* A web application allows uploading any kind of "unusual" file type
  that is not part of the server's mime.types. (The mime.types is in no
  way standardized and differs significantly between distros, so there
  can be practically no expectaiton on what that exactly means.) Let's
  use a fictional file format .aaa as an example.
* The web server will either guess the content type on its own or send
  it without a content type and then the browser will guess the content
  type. Both are bad. (Some web browsers - notably Edge+Firefox will
  even guess the content when the "X-Content-Type-Options: nosniff"
  header is sent, because that originally was only designed for .js
  and .css files and thus won't prevent HTML sniffing.)
* An attacker can upload a file example.aaa that contains html code and
  javascript.
* Calling that file will execute the javascript - you have an XSS.

This is a tricky to avoid issues. A web application like wordpress can
hardly do anything about it (except maybe not allowing uploads of any
"unusual" file types, but as written above, this is hard to define).

Now the safest way for a server to prevent this would be:
a) don't guess content types.
b) send a "safe" content type (e.g. application/octet-stream) for each
file that has no extension that can be assigned via mime.types.

Now the Authoritative Metadata standard says this SHOULD NOT happen:
"Good Practice
Server software designers (implementers) SHOULD NOT specify default
representation metadata, such as media type, character encoding, or
content language, within the standard configuration shipped with the
server.

Instead of specifying a default for metadata, it is better for
representations to be sent without that metadata. That allows the
recipient to guess the metadata instead of being forced to either
accept incorrect metadata or be tempted to violate Web architecture by
ignoring it."

In Apache this even led to complete removal of that option, going even
a step further (not just not doing this by default, but actively
removing any way for users to do this). Contrary to that Nginx sends a
default content type.

I don't see any good justification for that option. It says "That
allows the recipient to guess the metadata instead of being forced to
either accept incorrect metadata or be tempted to violate Web
architecture by ignoring it." But that's not a good thing: It's a
security risk.

The whole document doesn't mention XSS or Cross Site Scripting, so I
wonder if this has been considered in any way. I'm writing you to
hopefully better understand why that decision was made and what the
reasons were. As far as I see it there's a security flaw and one of the
most obvious and likely robust fixes is forbidden by a standard without
any good justification. I think the standard should be changed.

[1] https://www.w3.org/2001/tag/doc/mime-respect#reducing-inconsistency

-- 
Hanno Böck
https://hboeck.de/

mail/jabber: hanno@hboeck.de
GPG: FE73757FA60E4E21B937579FA5880072BBB51E42

Received on Friday, 11 January 2019 07:46:45 UTC