Re: XMLNS in Inkscape and other editors

On Sun, Nov 22, 2009 at 5:27 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
> Adam Barth wrote:
>> The algorithm tolerates leading white space, but not leading BOMs.
>
> Is there a particular reason why the BOM is not tolerated, given
> <http://www.w3.org/TR/REC-xml/#sec-guessing>?

The reason is, essentially, because Firefox does not tolerate a BOM in
this case.

> On the other hand, the UTF-8 BOM serves a very useful purpose
> (auto-detection of the character set when not out-of-band encoding
> information is available), and therefore I would expect a content sniffing
> algorithm to take it int account.

The BOM actually convinces the algorithm that the response is text/plain.

> Let's ignore "correctly" for a second -- it *does* render in IE8, Opera and
> Safari on Win7. So the two latter UAs seem to differ in the way they do
> content-sniffing from Firefox.

Indeed.  Here are the signatures used by a bunch of browser sniffing algorithms:

http://webblaze.cs.berkeley.edu/2009/content-sniffing/

In particular, IE and Safari are willing to sniff text/html from a
resource that has various HTML tags regardless of the leading
characters.  Firefox and Chrome tolerate only leading whitespace.
Skipping arbitrary characters is dangerous from a security point of
view because it lets an attacker create chameleon documents that
appear to be both HTML and some other type, like PDF.  The paper on
that page details some attacks based on this behavior.

The sniffing algorithm is extremely delicate.  It's possible we should
tolerate leading BOMs, but I would want to do some measurement first
to see exactly what the impact would be.

Adam

Received on Sunday, 22 November 2009 15:49:24 UTC