Re: XMLNS in Inkscape and other editors from Joe D Williams on 2009-11-23 (public-html@w3.org from November 2009)

From: Joe D Williams <joedwil@earthlink.net>
Date: Mon, 23 Nov 2009 05:37:04 -0800
To: "Adam Barth" <w3c@adambarth.com>, "Julian Reschke" <julian.reschke@gmx.de>
Cc: "Boris Zbarsky" <bzbarsky@mit.edu>, "Gavin Carothers" <gavin@carothers.name>, "Maciej Stachowiak" <mjs@apple.com>, "HTMLwg" <public-html@w3.org>
Message-ID: <E98FFAA31B3E4534BB79ADA213083D3A@joe1446a4150a8>

>>> The algorithm tolerates leading white space, but not leading BOMs.
>>

didn't all BOM history say - if no BOM then utf-8.
Then, there became the optional utf-8 BOM. For html browsers, some of 
them became broken for a while when the utf-8 BOM was served; then 
seems like they began to tolerate a utf-8 BOM, as they should. The 
only other agument I have heard against honoring a utf-8 BOM was that 
allowing it rather than failing if first character wrong (x3d files 
use #) was that BOM check slowed the loader. For text/html I would say 
this is not a good reason so of course the 'optional' utf-8 BOM shuld 
be honored without further questioning in that respect.
Thanks and Best Regards,
Joe


----- Original Message ----- 
From: "Adam Barth" <w3c@adambarth.com>
To: "Julian Reschke" <julian.reschke@gmx.de>
Cc: "Boris Zbarsky" <bzbarsky@mit.edu>; "Gavin Carothers"
<gavin@carothers.name>; "Maciej Stachowiak" <mjs@apple.com>; "HTMLwg"
<public-html@w3.org>
Sent: Sunday, November 22, 2009 7:48 AM
Subject: Re: XMLNS in Inkscape and other editors


> On Sun, Nov 22, 2009 at 5:27 AM, Julian Reschke
> <julian.reschke@gmx.de> wrote:
>> Adam Barth wrote:
>>> The algorithm tolerates leading white space, but not leading BOMs.
>>
>> Is there a particular reason why the BOM is not tolerated, given
>> <http://www.w3.org/TR/REC-xml/#sec-guessing>?
>
> The reason is, essentially, because Firefox does not tolerate a BOM
> in
> this case.
>
>> On the other hand, the UTF-8 BOM serves a very useful purpose
>> (auto-detection of the character set when not out-of-band encoding
>> information is available), and therefore I would expect a content
>> sniffing
>> algorithm to take it int account.
>
> The BOM actually convinces the algorithm that the response is
> text/plain.
>
>> Let's ignore "correctly" for a second -- it *does* render in IE8,
>> Opera and
>> Safari on Win7. So the two latter UAs seem to differ in the way
>> they do
>> content-sniffing from Firefox.
>
> Indeed.  Here are the signatures used by a bunch of browser sniffing
> algorithms:
>
> http://webblaze.cs.berkeley.edu/2009/content-sniffing/
>
> In particular, IE and Safari are willing to sniff text/html from a
> resource that has various HTML tags regardless of the leading
> characters.  Firefox and Chrome tolerate only leading whitespace.
> Skipping arbitrary characters is dangerous from a security point of
> view because it lets an attacker create chameleon documents that
> appear to be both HTML and some other type, like PDF.  The paper on
> that page details some attacks based on this behavior.
>
> The sniffing algorithm is extremely delicate.  It's possible we
> should
> tolerate leading BOMs, but I would want to do some measurement first
> to see exactly what the impact would be.
>
> Adam
>

Received on Monday, 23 November 2009 13:38:01 UTC