Re: type parameter of Document.open() (detailed review of the DOM) from Ian Hickson on 2009-01-16 (public-html@w3.org from January 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Fri, 16 Jan 2009 07:09:36 +0000 (UTC)
To: Maciej Stachowiak <mjs@apple.com>
Cc: Boris Zbarsky <bzbarsky@MIT.EDU>, public-html <public-html@w3.org>
Message-ID: <Pine.LNX.4.62.0901160642120.7181@hixie.dreamhostps.com>

On Thu, 14 Aug 2008, Maciej Stachowiak wrote:
> On Aug 14, 2008, at 1:33 PM, Ian Hickson wrote:
> > On Wed, 13 Aug 2008, Boris Zbarsky wrote:
> > > > 
> > > > I don't understand the security risk. Could you elaborate on what 
> > > > the threat is?
> > > 
> > > The obvious threat is that someone writes (or wrote awhile back) 
> > > something, tests (or tested) in their browser, it doesn't render as 
> > > HTML (or didn't back when they tested), then we render it as HTML.
> > > 
> > > Obvious examples that come up are image types in IE, or a whole slew 
> > > of stuff in Netscape 4 (think old site that no one has bothered to 
> > > update, and yes such things still exist: we get people complaining 
> > > that they can't document.open('application/postscript') in current 
> > > Gecko).
> > 
> > Fair enough.
> > 
> > The risk of implementing this as Firefox does, of course, is lack of 
> > compatibility with pages that are expecting HTML handling. To gain 
> > some level of compatibility we have to, at a minimum, strip leading 
> > and trailing space characters, and ignore any content after the first 
> > semicolon.
> > 
> > Now the question is, are other browser vendors willing to change to 
> > this?
> > 
> > I've changed the spec for now, but I would really appreciate 
> > confirmation from WebKit, Opera, and IE representatives that this 
> > change is one that the majority of browser vendors are willing to 
> > implement.
> 
> WebKit doesn't match either Firefox or IE currently (we always use 
> text/html as you said). I would prefer to go with the IE behavior or 
> something close to it. I think the security risk of defaulting unknown 
> types to text/html is very small. There may be sites that have not been 
> updated since the Netscape 4 days, but it's unlikely any have enough 
> regular users to be targeted by security attacks. On the other hand, it 
> seems the compatibility risk is real, since Firefox must do trickier 
> parsing to catch some types that must indeed be treated as text/html.
> 
> Admittedly, this opinion is not informed by extensive testing.

I tried reverse engineering what IE does here but I lost patience with the 
weird behavior I was seeing before I managed to get a coherent picture, so 
I left the spec as is (more or less matching Gecko's behaviour).

As far as I can tell, IE does an ASCII insensitive comparison against 
the string "text/plain", without trimming spaces or doing anything with 
semicolons. If it finds a match it does the PLAINTEXT thing. Otherwise it 
does the HTML thing unless the type is a known image/* type, in which case 
it throws an exception.

The spec behavior is to drop anything after a semicolon, trim spaces, and 
do a case-insensitive match against "text/html". If it finds a match, it 
does the HTML thing. Otherwise it does the PLAINTEXT thing.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 16 January 2009 07:10:13 UTC