Re: HTML5 discussions regarding charset determination and sniffing from Julian Reschke on 2010-09-30 (www-tag@w3.org from September 2010)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Thu, 30 Sep 2010 14:06:30 +0200
To: Noah Mendelsohn <noah@arcanedomain.com>
CC: "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <4CA47D46.10906@gmx.de>

On 30.09.2010 09:56, Julian Reschke wrote:
> On 29.09.2010 23:29, Noah Mendelsohn wrote:
>> I notice that there is active discussion in two HTML5-related Bugzilla
>> entries [1,2] of details related to charset detection. I'm not up on the
>> details, but at least the title of [2] suggests that charset sniffing is
>> involved (to my untrained eye, most of the debate seems to be about
>> parsing of charset parameters). Anyway, given the TAG's ongoing interest
>> in adherence to HTTP specifications in general, and sniffing in
>> particular, I thought I'd point these out.
>>
>> Noah
>>
>> [1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=9628
>> [2] http://www.w3.org/Bugs/Public/show_bug.cgi?id=10804
>
> ...and
>
> http://www.w3.org/Bugs/Public/show_bug.cgi?id=10805
>
> The background is that HTML5 specifies an algorithm for extracting the
> charset from content type information, which (1) requires accepting
> invalid forms (single quotes), and (2) requires not to properly handle
> escapes in quoted strings.
>
> The spec claims it's needed for legacy content, but for both cases there
> are examples of UAs that do not implement this today; so that claim is
> really really weak.

Bugs 10804 and 10805 have been rejected, so I have raised issues 
http://www.w3.org/html/wg/tracker/issues/125 and 
http://www.w3.org/html/wg/tracker/issues/126.

Bug 9628 (which asks for clarification what the incompatibility with RFC 
2616 is) *has* been fixed, which is a good thing. I wish the spec did 
the same for all other "willful violations".

Best regards, Julian

Received on Thursday, 30 September 2010 12:13:49 UTC