Re: HTML5 discussions regarding charset determination and sniffing from Julian Reschke on 2010-09-30 (www-tag@w3.org from September 2010)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Thu, 30 Sep 2010 09:56:07 +0200
To: Noah Mendelsohn <noah@arcanedomain.com>
CC: "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <4CA44297.5080004@gmx.de>

On 29.09.2010 23:29, Noah Mendelsohn wrote:
> I notice that there is active discussion in two HTML5-related Bugzilla
> entries [1,2] of details related to charset detection. I'm not up on the
> details, but at least the title of [2] suggests that charset sniffing is
> involved (to my untrained eye, most of the debate seems to be about
> parsing of charset parameters). Anyway, given the TAG's ongoing interest
> in adherence to HTTP specifications in general, and sniffing in
> particular, I thought I'd point these out.
>
> Noah
>
> [1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=9628
> [2] http://www.w3.org/Bugs/Public/show_bug.cgi?id=10804

...and

   http://www.w3.org/Bugs/Public/show_bug.cgi?id=10805

The background is that HTML5 specifies an algorithm for extracting the 
charset from content type information, which (1) requires accepting 
invalid forms (single quotes), and (2) requires not to properly handle 
escapes in quoted strings.

The spec claims it's needed for legacy content, but for both cases there 
are examples of UAs that do not implement this today; so that claim is 
really really weak.

Best regards, Julian

Received on Thursday, 30 September 2010 07:56:47 UTC