Re: type parameter of Document.open() (detailed review of the DOM)

Ian Hickson wrote:
>> 1) We added the text/plain support when someone complained about script 
>> injection issues in content they were document.writing as text/plain and 
>> which we at the time treated as HTML.  Given that, we decided that 
>> unknown types should be either treated as text/plain or throw, with 
>> text/plain being marginally more useful.
> 
> That seems like a somewhat overenthusiastic fix -- why not just do what IE 
> does? That would presumably still allow for safe handling of text/plain 
> content.

Indeed.  Note that at the time we didn't do an exhaustive analysis of 
what other browsers did and just went with being secure by default.

> Since no other browser treats anything as text/plain other than 
> text/plain, nobody would presumably send content with other random MIME 
> types and expect a non-scripted handling.

In addition to the above (we didn't test other browsers too much here), 
I don't think it's worth basing security of websites in our browser on 
assumptions about what other browsers do if there is a simple way to 
improve on said security without breaking the web...

> This only seems to be required if you do the above behaviour of treating 
> things as text/plain instead of text/html by default

Indeed.

> I don't understand the security risk. Could you elaborate on what the 
> threat is?

The obvious threat is that someone writes (or wrote awhile back) 
something, tests (or tested) in their browser, it doesn't render as HTML 
(or didn't back when they tested), then we render it as HTML.

Obvious examples that come up are image types in IE, or a whole slew of 
stuff in Netscape 4 (think old site that no one has bothered to update, 
and yes such things still exist: we get people complaining that they 
can't document.open('application/postscript') in current Gecko).

Just treating unknown types as text/plain means the failure mode is that 
HTML renders as text in some edge case that we haven't thought of 
(happened once so far with the charset thing).  Doing it the other way 
means the failure mode is that some random content is rendered as HTML 
when the site doesn't expect it to be and our product can be used as an 
attack vector.

Since both approaches cover at least 99% of cases fine (probably more, 
but I don't have numbers to back this up), the failure modes are highly 
relevant in deciding which one to pick, and the former failure mode is 
vastly preferable from my point of view.

-Boris

Received on Thursday, 14 August 2008 03:42:56 UTC