Re: XMLNS in Inkscape and other editors from Adam Barth on 2009-11-23 (public-html@w3.org from November 2009)

From: Adam Barth <w3c@adambarth.com>
Date: Mon, 23 Nov 2009 07:55:09 -0800
To: Julian Reschke <julian.reschke@gmx.de>
Cc: Joe D Williams <joedwil@earthlink.net>, Boris Zbarsky <bzbarsky@mit.edu>, Gavin Carothers <gavin@carothers.name>, Maciej Stachowiak <mjs@apple.com>, HTMLwg <public-html@w3.org>
Message-ID: <7789133a0911230755p590114b8s13308f561788cbf0@mail.gmail.com>

On Mon, Nov 23, 2009 at 7:41 AM, Julian Reschke <julian.reschke@gmx.de> wrote:
> Adam Barth wrote:
>> On Mon, Nov 23, 2009 at 7:25 AM, Julian Reschke <julian.reschke@gmx.de>
>> wrote:
>>> Adam Barth wrote:
>>>> Unfortunately, a browser's content sniffing algorithm is a subtle
>>>> beast.  I would not recommend changing the algorithm because of
>>>> aesthetics.  Instead, I recommend changing the algorithm either (1) to
>>>> improve security, (2) to improve compatibility with web content, or
>>>> (3) to improve interoperability with other browsers.
>>>> ...
>>>
>>> (2) and (3) seem to be arguments in favor of handling the UTF-8 BOM.
>>
>> Maybe, but maybe not.  For (2), we should do a careful measurement
>> instead of relying on this one anecdote.  For (3), there's no way to
>> chase IE's tail here without giving up on (1).  Instead, I've
>
> But IE is consistent with Safari and Opera here, isn't it?

Yes.  Both IE and Safari both use an insecure HTML signature.  I
haven't studied Opera's sniffing algorithm in detail.

>> recommended in the past (and continue to recommend) that other
>> browsers use Firefox's HTML signature (with a handful of changes that
>> measurability improve compatibility).
>> ...
>
> I do agree that minimizing sniffing is a good thing when the server
> indicates a media type. However, in this case, the server did not do that,
> and ignoring a UTF-8-BOM appears to be the wrong thing to do in this case.

Whether we should change the sniffing algorithm in this way is purely
a quantitative question.  Is the compatibility we gain worth the
security and stability costs.  From this one example, we can't
estimate what the compatibility gain is.  As I've said a number of
times on this thread, we'd have to measure to find out.

To be more concrete, consider the question of whether the algorithm
should tolerate leading whitespace before the first HTML tag in the
HTML signature.  It turns out that this causes the HTML signature to
trigger 9% more often, which is a measurable gain in compatibility
for, IMHO, a small loss in security.  I suspect, although I haven't
measured, that tolerating BOMs will be at least one or two orders of
magnitude less important.

At a higher level, my numerical goals for the content sniffing
algorithm are that it is compatible with between 99.99% and 99.999% of
web pages.  Does this issue occur on more than 1 out of every 100,000
web pages?  1 out of every 1,000,000?

Adam

Received on Monday, 23 November 2009 15:56:09 UTC