W3C home > Mailing lists > Public > public-html@w3.org > November 2009

Re: XMLNS in Inkscape and other editors

From: Adam Barth <w3c@adambarth.com>
Date: Mon, 23 Nov 2009 07:10:53 -0800
Message-ID: <7789133a0911230710j2dce9a14refb98c4d34ba315d@mail.gmail.com>
To: Joe D Williams <joedwil@earthlink.net>
Cc: Julian Reschke <julian.reschke@gmx.de>, Boris Zbarsky <bzbarsky@mit.edu>, Gavin Carothers <gavin@carothers.name>, Maciej Stachowiak <mjs@apple.com>, HTMLwg <public-html@w3.org>
Unfortunately, a browser's content sniffing algorithm is a subtle
beast.  I would not recommend changing the algorithm because of
aesthetics.  Instead, I recommend changing the algorithm either (1) to
improve security, (2) to improve compatibility with web content, or
(3) to improve interoperability with other browsers.


On Mon, Nov 23, 2009 at 5:37 AM, Joe D Williams <joedwil@earthlink.net> wrote:
>>>> The algorithm tolerates leading white space, but not leading BOMs.
> didn't all BOM history say - if no BOM then utf-8.
> Then, there became the optional utf-8 BOM. For html browsers, some of them
> became broken for a while when the utf-8 BOM was served; then seems like
> they began to tolerate a utf-8 BOM, as they should. The only other agument I
> have heard against honoring a utf-8 BOM was that allowing it rather than
> failing if first character wrong (x3d files use #) was that BOM check slowed
> the loader. For text/html I would say this is not a good reason so of course
> the 'optional' utf-8 BOM shuld be honored without further questioning in
> that respect.
> Thanks and Best Regards,
> Joe
> ----- Original Message ----- From: "Adam Barth" <w3c@adambarth.com>
> To: "Julian Reschke" <julian.reschke@gmx.de>
> Cc: "Boris Zbarsky" <bzbarsky@mit.edu>; "Gavin Carothers"
> <gavin@carothers.name>; "Maciej Stachowiak" <mjs@apple.com>; "HTMLwg"
> <public-html@w3.org>
> Sent: Sunday, November 22, 2009 7:48 AM
> Subject: Re: XMLNS in Inkscape and other editors
>> On Sun, Nov 22, 2009 at 5:27 AM, Julian Reschke
>> <julian.reschke@gmx.de> wrote:
>>> Adam Barth wrote:
>>>> The algorithm tolerates leading white space, but not leading BOMs.
>>> Is there a particular reason why the BOM is not tolerated, given
>>> <http://www.w3.org/TR/REC-xml/#sec-guessing>?
>> The reason is, essentially, because Firefox does not tolerate a BOM
>> in
>> this case.
>>> On the other hand, the UTF-8 BOM serves a very useful purpose
>>> (auto-detection of the character set when not out-of-band encoding
>>> information is available), and therefore I would expect a content
>>> sniffing
>>> algorithm to take it int account.
>> The BOM actually convinces the algorithm that the response is
>> text/plain.
>>> Let's ignore "correctly" for a second -- it *does* render in IE8,
>>> Opera and
>>> Safari on Win7. So the two latter UAs seem to differ in the way
>>> they do
>>> content-sniffing from Firefox.
>> Indeed.  Here are the signatures used by a bunch of browser sniffing
>> algorithms:
>> http://webblaze.cs.berkeley.edu/2009/content-sniffing/
>> In particular, IE and Safari are willing to sniff text/html from a
>> resource that has various HTML tags regardless of the leading
>> characters.  Firefox and Chrome tolerate only leading whitespace.
>> Skipping arbitrary characters is dangerous from a security point of
>> view because it lets an attacker create chameleon documents that
>> appear to be both HTML and some other type, like PDF.  The paper on
>> that page details some attacks based on this behavior.
>> The sniffing algorithm is extremely delicate.  It's possible we
>> should
>> tolerate leading BOMs, but I would want to do some measurement first
>> to see exactly what the impact would be.
>> Adam
Received on Monday, 23 November 2009 15:12:00 UTC

This archive was generated by hypermail 2.4.0 : Saturday, 9 October 2021 18:45:03 UTC