- From: Adam Barth <w3c@adambarth.com>
- Date: Mon, 23 Nov 2009 09:12:45 -0800
- To: Joe D Williams <joedwil@earthlink.net>
- Cc: Julian Reschke <julian.reschke@gmx.de>, Boris Zbarsky <bzbarsky@mit.edu>, Gavin Carothers <gavin@carothers.name>, Maciej Stachowiak <mjs@apple.com>, HTMLwg <public-html@w3.org>
You're correct that BOMs are optional when you correctly specify the media type of your content. We're discussing the error recovery behavior when web servers do not correct specify the media type of their content. Adam On Mon, Nov 23, 2009 at 9:04 AM, Joe D Williams <joedwil@earthlink.net> wrote: >> tolerating BOMs ... > > I don't get that. Isn't the utf-8 BOM officially optional if the intent is > to serve utf-8? no matter what the mime? > Frankly, I haven't experimented lately, but I thought toleration of the > utf-8 BOM had left my html interest because browsers i tested eventually got > over failing when the BOM was sent. Which am I missing? > Best Regards, > Joe > > > ----- Original Message ----- From: "Adam Barth" <w3c@adambarth.com> > To: "Julian Reschke" <julian.reschke@gmx.de> > Cc: "Joe D Williams" <joedwil@earthlink.net>; "Boris Zbarsky" > <bzbarsky@mit.edu>; "Gavin Carothers" <gavin@carothers.name>; "Maciej > Stachowiak" <mjs@apple.com>; "HTMLwg" <public-html@w3.org> > Sent: Monday, November 23, 2009 7:55 AM > Subject: Re: XMLNS in Inkscape and other editors > > > On Mon, Nov 23, 2009 at 7:41 AM, Julian Reschke <julian.reschke@gmx.de> > wrote: >> >> Adam Barth wrote: >>> >>> On Mon, Nov 23, 2009 at 7:25 AM, Julian Reschke <julian.reschke@gmx.de> >>> wrote: >>>> >>>> Adam Barth wrote: >>>>> >>>>> Unfortunately, a browser's content sniffing algorithm is a subtle >>>>> beast. I would not recommend changing the algorithm because of >>>>> aesthetics. Instead, I recommend changing the algorithm either (1) to >>>>> improve security, (2) to improve compatibility with web content, or >>>>> (3) to improve interoperability with other browsers. >>>>> ... >>>> >>>> (2) and (3) seem to be arguments in favor of handling the UTF-8 BOM. >>> >>> Maybe, but maybe not. For (2), we should do a careful measurement >>> instead of relying on this one anecdote. For (3), there's no way to >>> chase IE's tail here without giving up on (1). Instead, I've >> >> But IE is consistent with Safari and Opera here, isn't it? > > Yes. Both IE and Safari both use an insecure HTML signature. I > haven't studied Opera's sniffing algorithm in detail. > >>> recommended in the past (and continue to recommend) that other >>> browsers use Firefox's HTML signature (with a handful of changes that >>> measurability improve compatibility). >>> ... >> >> I do agree that minimizing sniffing is a good thing when the server >> indicates a media type. However, in this case, the server did not do that, >> and ignoring a UTF-8-BOM appears to be the wrong thing to do in this case. > > Whether we should change the sniffing algorithm in this way is purely > a quantitative question. Is the compatibility we gain worth the > security and stability costs. From this one example, we can't > estimate what the compatibility gain is. As I've said a number of > times on this thread, we'd have to measure to find out. > > To be more concrete, consider the question of whether the algorithm > should tolerate leading whitespace before the first HTML tag in the > HTML signature. It turns out that this causes the HTML signature to > trigger 9% more often, which is a measurable gain in compatibility > for, IMHO, a small loss in security. I suspect, although I haven't > measured, that tolerating BOMs will be at least one or two orders of > magnitude less important. > > At a higher level, my numerical goals for the content sniffing > algorithm are that it is compatible with between 99.99% and 99.999% of > web pages. Does this issue occur on more than 1 out of every 100,000 > web pages? 1 out of every 1,000,000? > > Adam > >
Received on Monday, 23 November 2009 17:15:40 UTC