Re: XMLNS in Inkscape and other editors from Joe D Williams on 2009-11-23 (public-html@w3.org from November 2009)

From: Joe D Williams <joedwil@earthlink.net>
Date: Mon, 23 Nov 2009 09:04:12 -0800
To: "Adam Barth" <w3c@adambarth.com>, "Julian Reschke" <julian.reschke@gmx.de>
Cc: "Boris Zbarsky" <bzbarsky@mit.edu>, "Gavin Carothers" <gavin@carothers.name>, "Maciej Stachowiak" <mjs@apple.com>, "HTMLwg" <public-html@w3.org>
Message-ID: <F905839A975747C19E100EB0C6D51F65@joe1446a4150a8>

> tolerating BOMs ...

I don't get that. Isn't the utf-8 BOM officially optional if the 
intent is to serve utf-8? no matter what the mime?
Frankly, I haven't experimented lately, but I thought toleration of 
the utf-8 BOM had left my html interest because browsers i tested 
eventually got over failing when the BOM was sent. Which am I missing?
Best Regards,
Joe

----- Original Message ----- 
From: "Adam Barth" <w3c@adambarth.com>
To: "Julian Reschke" <julian.reschke@gmx.de>
Cc: "Joe D Williams" <joedwil@earthlink.net>; "Boris Zbarsky" 
<bzbarsky@mit.edu>; "Gavin Carothers" <gavin@carothers.name>; "Maciej 
Stachowiak" <mjs@apple.com>; "HTMLwg" <public-html@w3.org>
Sent: Monday, November 23, 2009 7:55 AM
Subject: Re: XMLNS in Inkscape and other editors

On Mon, Nov 23, 2009 at 7:41 AM, Julian Reschke 
<julian.reschke@gmx.de> wrote:
> Adam Barth wrote:
>> On Mon, Nov 23, 2009 at 7:25 AM, Julian Reschke 
>> <julian.reschke@gmx.de>
>> wrote:
>>> Adam Barth wrote:
>>>> Unfortunately, a browser's content sniffing algorithm is a subtle
>>>> beast. I would not recommend changing the algorithm because of
>>>> aesthetics. Instead, I recommend changing the algorithm either 
>>>> (1) to
>>>> improve security, (2) to improve compatibility with web content, 
>>>> or
>>>> (3) to improve interoperability with other browsers.
>>>> ...
>>>
>>> (2) and (3) seem to be arguments in favor of handling the UTF-8 
>>> BOM.
>>
>> Maybe, but maybe not. For (2), we should do a careful measurement
>> instead of relying on this one anecdote. For (3), there's no way to
>> chase IE's tail here without giving up on (1). Instead, I've
>
> But IE is consistent with Safari and Opera here, isn't it?

Yes.  Both IE and Safari both use an insecure HTML signature.  I
haven't studied Opera's sniffing algorithm in detail.

>> recommended in the past (and continue to recommend) that other
>> browsers use Firefox's HTML signature (with a handful of changes 
>> that
>> measurability improve compatibility).
>> ...
>
> I do agree that minimizing sniffing is a good thing when the server
> indicates a media type. However, in this case, the server did not do 
> that,
> and ignoring a UTF-8-BOM appears to be the wrong thing to do in this 
> case.

Whether we should change the sniffing algorithm in this way is purely
a quantitative question.  Is the compatibility we gain worth the
security and stability costs.  From this one example, we can't
estimate what the compatibility gain is.  As I've said a number of
times on this thread, we'd have to measure to find out.

To be more concrete, consider the question of whether the algorithm
should tolerate leading whitespace before the first HTML tag in the
HTML signature.  It turns out that this causes the HTML signature to
trigger 9% more often, which is a measurable gain in compatibility
for, IMHO, a small loss in security.  I suspect, although I haven't
measured, that tolerating BOMs will be at least one or two orders of
magnitude less important.

At a higher level, my numerical goals for the content sniffing
algorithm are that it is compatible with between 99.99% and 99.999% of
web pages.  Does this issue occur on more than 1 out of every 100,000
web pages?  1 out of every 1,000,000?

Adam

Received on Monday, 23 November 2009 17:05:01 UTC