- From: Joe D Williams <joedwil@earthlink.net>
- Date: Mon, 23 Nov 2009 05:37:04 -0800
- To: "Adam Barth" <w3c@adambarth.com>, "Julian Reschke" <julian.reschke@gmx.de>
- Cc: "Boris Zbarsky" <bzbarsky@mit.edu>, "Gavin Carothers" <gavin@carothers.name>, "Maciej Stachowiak" <mjs@apple.com>, "HTMLwg" <public-html@w3.org>
>>> The algorithm tolerates leading white space, but not leading BOMs. >> didn't all BOM history say - if no BOM then utf-8. Then, there became the optional utf-8 BOM. For html browsers, some of them became broken for a while when the utf-8 BOM was served; then seems like they began to tolerate a utf-8 BOM, as they should. The only other agument I have heard against honoring a utf-8 BOM was that allowing it rather than failing if first character wrong (x3d files use #) was that BOM check slowed the loader. For text/html I would say this is not a good reason so of course the 'optional' utf-8 BOM shuld be honored without further questioning in that respect. Thanks and Best Regards, Joe ----- Original Message ----- From: "Adam Barth" <w3c@adambarth.com> To: "Julian Reschke" <julian.reschke@gmx.de> Cc: "Boris Zbarsky" <bzbarsky@mit.edu>; "Gavin Carothers" <gavin@carothers.name>; "Maciej Stachowiak" <mjs@apple.com>; "HTMLwg" <public-html@w3.org> Sent: Sunday, November 22, 2009 7:48 AM Subject: Re: XMLNS in Inkscape and other editors > On Sun, Nov 22, 2009 at 5:27 AM, Julian Reschke > <julian.reschke@gmx.de> wrote: >> Adam Barth wrote: >>> The algorithm tolerates leading white space, but not leading BOMs. >> >> Is there a particular reason why the BOM is not tolerated, given >> <http://www.w3.org/TR/REC-xml/#sec-guessing>? > > The reason is, essentially, because Firefox does not tolerate a BOM > in > this case. > >> On the other hand, the UTF-8 BOM serves a very useful purpose >> (auto-detection of the character set when not out-of-band encoding >> information is available), and therefore I would expect a content >> sniffing >> algorithm to take it int account. > > The BOM actually convinces the algorithm that the response is > text/plain. > >> Let's ignore "correctly" for a second -- it *does* render in IE8, >> Opera and >> Safari on Win7. So the two latter UAs seem to differ in the way >> they do >> content-sniffing from Firefox. > > Indeed. Here are the signatures used by a bunch of browser sniffing > algorithms: > > http://webblaze.cs.berkeley.edu/2009/content-sniffing/ > > In particular, IE and Safari are willing to sniff text/html from a > resource that has various HTML tags regardless of the leading > characters. Firefox and Chrome tolerate only leading whitespace. > Skipping arbitrary characters is dangerous from a security point of > view because it lets an attacker create chameleon documents that > appear to be both HTML and some other type, like PDF. The paper on > that page details some attacks based on this behavior. > > The sniffing algorithm is extremely delicate. It's possible we > should > tolerate leading BOMs, but I would want to do some measurement first > to see exactly what the impact would be. > > Adam >
Received on Monday, 23 November 2009 13:38:01 UTC