Re: XMLNS in Inkscape and other editors from Adam Barth on 2009-11-23 (public-html@w3.org from November 2009)

From: Adam Barth <w3c@adambarth.com>
Date: Mon, 23 Nov 2009 07:10:53 -0800
To: Joe D Williams <joedwil@earthlink.net>
Cc: Julian Reschke <julian.reschke@gmx.de>, Boris Zbarsky <bzbarsky@mit.edu>, Gavin Carothers <gavin@carothers.name>, Maciej Stachowiak <mjs@apple.com>, HTMLwg <public-html@w3.org>
Message-ID: <7789133a0911230710j2dce9a14refb98c4d34ba315d@mail.gmail.com>

Unfortunately, a browser's content sniffing algorithm is a subtle
beast.  I would not recommend changing the algorithm because of
aesthetics.  Instead, I recommend changing the algorithm either (1) to
improve security, (2) to improve compatibility with web content, or
(3) to improve interoperability with other browsers.

Adam


On Mon, Nov 23, 2009 at 5:37 AM, Joe D Williams <joedwil@earthlink.net> wrote:
>>>> The algorithm tolerates leading white space, but not leading BOMs.
>
> didn't all BOM history say - if no BOM then utf-8.
> Then, there became the optional utf-8 BOM. For html browsers, some of them
> became broken for a while when the utf-8 BOM was served; then seems like
> they began to tolerate a utf-8 BOM, as they should. The only other agument I
> have heard against honoring a utf-8 BOM was that allowing it rather than
> failing if first character wrong (x3d files use #) was that BOM check slowed
> the loader. For text/html I would say this is not a good reason so of course
> the 'optional' utf-8 BOM shuld be honored without further questioning in
> that respect.
> Thanks and Best Regards,
> Joe
>
>
> ----- Original Message ----- From: "Adam Barth" <w3c@adambarth.com>
> To: "Julian Reschke" <julian.reschke@gmx.de>
> Cc: "Boris Zbarsky" <bzbarsky@mit.edu>; "Gavin Carothers"
> <gavin@carothers.name>; "Maciej Stachowiak" <mjs@apple.com>; "HTMLwg"
> <public-html@w3.org>
> Sent: Sunday, November 22, 2009 7:48 AM
> Subject: Re: XMLNS in Inkscape and other editors
>
>
>> On Sun, Nov 22, 2009 at 5:27 AM, Julian Reschke
>> <julian.reschke@gmx.de> wrote:
>>>
>>> Adam Barth wrote:
>>>>
>>>> The algorithm tolerates leading white space, but not leading BOMs.
>>>
>>> Is there a particular reason why the BOM is not tolerated, given
>>> <http://www.w3.org/TR/REC-xml/#sec-guessing>?
>>
>> The reason is, essentially, because Firefox does not tolerate a BOM
>> in
>> this case.
>>
>>> On the other hand, the UTF-8 BOM serves a very useful purpose
>>> (auto-detection of the character set when not out-of-band encoding
>>> information is available), and therefore I would expect a content
>>> sniffing
>>> algorithm to take it int account.
>>
>> The BOM actually convinces the algorithm that the response is
>> text/plain.
>>
>>> Let's ignore "correctly" for a second -- it *does* render in IE8,
>>> Opera and
>>> Safari on Win7. So the two latter UAs seem to differ in the way
>>> they do
>>> content-sniffing from Firefox.
>>
>> Indeed.  Here are the signatures used by a bunch of browser sniffing
>> algorithms:
>>
>> http://webblaze.cs.berkeley.edu/2009/content-sniffing/
>>
>> In particular, IE and Safari are willing to sniff text/html from a
>> resource that has various HTML tags regardless of the leading
>> characters.  Firefox and Chrome tolerate only leading whitespace.
>> Skipping arbitrary characters is dangerous from a security point of
>> view because it lets an attacker create chameleon documents that
>> appear to be both HTML and some other type, like PDF.  The paper on
>> that page details some attacks based on this behavior.
>>
>> The sniffing algorithm is extremely delicate.  It's possible we
>> should
>> tolerate leading BOMs, but I would want to do some measurement first
>> to see exactly what the impact would be.
>>
>> Adam
>>
>
>

Received on Monday, 23 November 2009 15:12:00 UTC