- From: Joe D Williams <joedwil@earthlink.net>
- Date: Mon, 23 Nov 2009 09:46:00 -0800
- To: "Adam Barth" <w3c@adambarth.com>
- Cc: "Julian Reschke" <julian.reschke@gmx.de>, "Boris Zbarsky" <bzbarsky@mit.edu>, "Gavin Carothers" <gavin@carothers.name>, "Maciej Stachowiak" <mjs@apple.com>, "HTMLwg" <public-html@w3.org>
> You're correct that BOMs are optional when you correctly specify the media type of your content. Sorry if I'm off track or too limited in the definition, but no, I thought the BOM was optional if you intended to serve utf-8 regardless of the media type. Independent of anything else, if you get a utf-8 BOM, then you have utf-8. If you don't get a BOM it is also utf-8 (except for those media types that would never use utf-8?). Either way, in principle you know you have utf-8 (except for those media types that would never use utf-8?). . > We're discussing the error recovery behavior when web servers do not > correct specify the media type of their content. Now you must figure out how much you trust that utf-8 BOM or lack thereof that you will always optionally get Plus, if you don't trust or don't know the served media type, then you are looking for a marker of some kind (maybe <! for text/html i have to look again), to figure out the media type. Thanks and Best, Joe > Adam On Mon, Nov 23, 2009 at 9:04 AM, Joe D Williams <joedwil@earthlink.net> wrote: >> tolerating BOMs ... > > I don't get that. Isn't the utf-8 BOM officially optional if the > intent is > to serve utf-8? no matter what the mime? > Frankly, I haven't experimented lately, but I thought toleration of > the > utf-8 BOM had left my html interest because browsers i tested > eventually got > over failing when the BOM was sent. Which am I missing? > Best Regards, > Joe > > > ----- Original Message ----- From: "Adam Barth" <w3c@adambarth.com> > To: "Julian Reschke" <julian.reschke@gmx.de> > Cc: "Joe D Williams" <joedwil@earthlink.net>; "Boris Zbarsky" > <bzbarsky@mit.edu>; "Gavin Carothers" <gavin@carothers.name>; > "Maciej > Stachowiak" <mjs@apple.com>; "HTMLwg" <public-html@w3.org> > Sent: Monday, November 23, 2009 7:55 AM > Subject: Re: XMLNS in Inkscape and other editors > > > On Mon, Nov 23, 2009 at 7:41 AM, Julian Reschke > <julian.reschke@gmx.de> > wrote: >> >> Adam Barth wrote: >>> >>> On Mon, Nov 23, 2009 at 7:25 AM, Julian Reschke >>> <julian.reschke@gmx.de> >>> wrote: >>>> >>>> Adam Barth wrote: >>>>> >>>>> Unfortunately, a browser's content sniffing algorithm is a >>>>> subtle >>>>> beast. I would not recommend changing the algorithm because of >>>>> aesthetics. Instead, I recommend changing the algorithm either >>>>> (1) to >>>>> improve security, (2) to improve compatibility with web content, >>>>> or >>>>> (3) to improve interoperability with other browsers. >>>>> ... >>>> >>>> (2) and (3) seem to be arguments in favor of handling the UTF-8 >>>> BOM. >>> >>> Maybe, but maybe not. For (2), we should do a careful measurement >>> instead of relying on this one anecdote. For (3), there's no way >>> to >>> chase IE's tail here without giving up on (1). Instead, I've >> >> But IE is consistent with Safari and Opera here, isn't it? > > Yes. Both IE and Safari both use an insecure HTML signature. I > haven't studied Opera's sniffing algorithm in detail. > >>> recommended in the past (and continue to recommend) that other >>> browsers use Firefox's HTML signature (with a handful of changes >>> that >>> measurability improve compatibility). >>> ... >> >> I do agree that minimizing sniffing is a good thing when the server >> indicates a media type. However, in this case, the server did not >> do that, >> and ignoring a UTF-8-BOM appears to be the wrong thing to do in >> this case. > > Whether we should change the sniffing algorithm in this way is > purely > a quantitative question. Is the compatibility we gain worth the > security and stability costs. From this one example, we can't > estimate what the compatibility gain is. As I've said a number of > times on this thread, we'd have to measure to find out. > > To be more concrete, consider the question of whether the algorithm > should tolerate leading whitespace before the first HTML tag in the > HTML signature. It turns out that this causes the HTML signature to > trigger 9% more often, which is a measurable gain in compatibility > for, IMHO, a small loss in security. I suspect, although I haven't > measured, that tolerating BOMs will be at least one or two orders of > magnitude less important. > > At a higher level, my numerical goals for the content sniffing > algorithm are that it is compatible with between 99.99% and 99.999% > of > web pages. Does this issue occur on more than 1 out of every 100,000 > web pages? 1 out of every 1,000,000? > > Adam > >
Received on Monday, 23 November 2009 17:46:46 UTC