Re: XMLNS in Inkscape and other editors from Joe D Williams on 2009-11-23 (public-html@w3.org from November 2009)

From: Joe D Williams <joedwil@earthlink.net>
Date: Mon, 23 Nov 2009 09:46:00 -0800
To: "Adam Barth" <w3c@adambarth.com>
Cc: "Julian Reschke" <julian.reschke@gmx.de>, "Boris Zbarsky" <bzbarsky@mit.edu>, "Gavin Carothers" <gavin@carothers.name>, "Maciej Stachowiak" <mjs@apple.com>, "HTMLwg" <public-html@w3.org>
Message-ID: <4557B951EDAA468BB43712F3036FD215@joe1446a4150a8>

> You're correct that BOMs are optional when you correctly specify the
media type of your content.

Sorry if I'm off track or too limited in the definition, but no, I 
thought the BOM was optional if you intended to serve utf-8 regardless 
of the media type. Independent of anything else, if you get a utf-8 
BOM, then you have utf-8. If you don't get a BOM it is also utf-8 
(except for those media types that would never use utf-8?). Either 
way, in principle you know you have utf-8 (except for those media 
types that would never use utf-8?). .

> We're discussing the error recovery behavior when web servers do not 
> correct specify the media type of
their content.

Now you must figure out how much you trust that utf-8 BOM or lack 
thereof that you will always optionally get
Plus, if you don't trust or don't know the served media type, then you 
are looking for a marker of some kind (maybe <! for text/html i have 
to look again), to figure out the media type.

Thanks and Best,
Joe

> Adam


On Mon, Nov 23, 2009 at 9:04 AM, Joe D Williams 
<joedwil@earthlink.net> wrote:
>> tolerating BOMs ...
>
> I don't get that. Isn't the utf-8 BOM officially optional if the 
> intent is
> to serve utf-8? no matter what the mime?
> Frankly, I haven't experimented lately, but I thought toleration of 
> the
> utf-8 BOM had left my html interest because browsers i tested 
> eventually got
> over failing when the BOM was sent. Which am I missing?
> Best Regards,
> Joe
>
>
> ----- Original Message ----- From: "Adam Barth" <w3c@adambarth.com>
> To: "Julian Reschke" <julian.reschke@gmx.de>
> Cc: "Joe D Williams" <joedwil@earthlink.net>; "Boris Zbarsky"
> <bzbarsky@mit.edu>; "Gavin Carothers" <gavin@carothers.name>; 
> "Maciej
> Stachowiak" <mjs@apple.com>; "HTMLwg" <public-html@w3.org>
> Sent: Monday, November 23, 2009 7:55 AM
> Subject: Re: XMLNS in Inkscape and other editors
>
>
> On Mon, Nov 23, 2009 at 7:41 AM, Julian Reschke 
> <julian.reschke@gmx.de>
> wrote:
>>
>> Adam Barth wrote:
>>>
>>> On Mon, Nov 23, 2009 at 7:25 AM, Julian Reschke 
>>> <julian.reschke@gmx.de>
>>> wrote:
>>>>
>>>> Adam Barth wrote:
>>>>>
>>>>> Unfortunately, a browser's content sniffing algorithm is a 
>>>>> subtle
>>>>> beast. I would not recommend changing the algorithm because of
>>>>> aesthetics. Instead, I recommend changing the algorithm either 
>>>>> (1) to
>>>>> improve security, (2) to improve compatibility with web content, 
>>>>> or
>>>>> (3) to improve interoperability with other browsers.
>>>>> ...
>>>>
>>>> (2) and (3) seem to be arguments in favor of handling the UTF-8 
>>>> BOM.
>>>
>>> Maybe, but maybe not. For (2), we should do a careful measurement
>>> instead of relying on this one anecdote. For (3), there's no way 
>>> to
>>> chase IE's tail here without giving up on (1). Instead, I've
>>
>> But IE is consistent with Safari and Opera here, isn't it?
>
> Yes. Both IE and Safari both use an insecure HTML signature. I
> haven't studied Opera's sniffing algorithm in detail.
>
>>> recommended in the past (and continue to recommend) that other
>>> browsers use Firefox's HTML signature (with a handful of changes 
>>> that
>>> measurability improve compatibility).
>>> ...
>>
>> I do agree that minimizing sniffing is a good thing when the server
>> indicates a media type. However, in this case, the server did not 
>> do that,
>> and ignoring a UTF-8-BOM appears to be the wrong thing to do in 
>> this case.
>
> Whether we should change the sniffing algorithm in this way is 
> purely
> a quantitative question. Is the compatibility we gain worth the
> security and stability costs. From this one example, we can't
> estimate what the compatibility gain is. As I've said a number of
> times on this thread, we'd have to measure to find out.
>
> To be more concrete, consider the question of whether the algorithm
> should tolerate leading whitespace before the first HTML tag in the
> HTML signature. It turns out that this causes the HTML signature to
> trigger 9% more often, which is a measurable gain in compatibility
> for, IMHO, a small loss in security. I suspect, although I haven't
> measured, that tolerating BOMs will be at least one or two orders of
> magnitude less important.
>
> At a higher level, my numerical goals for the content sniffing
> algorithm are that it is compatible with between 99.99% and 99.999% 
> of
> web pages. Does this issue occur on more than 1 out of every 100,000
> web pages? 1 out of every 1,000,000?
>
> Adam
>
>

Received on Monday, 23 November 2009 17:46:46 UTC