Re: HTML5 discussions regarding charset determination and sniffing

On 30.09.2010 09:56, Julian Reschke wrote:
> On 29.09.2010 23:29, Noah Mendelsohn wrote:
>> I notice that there is active discussion in two HTML5-related Bugzilla
>> entries [1,2] of details related to charset detection. I'm not up on the
>> details, but at least the title of [2] suggests that charset sniffing is
>> involved (to my untrained eye, most of the debate seems to be about
>> parsing of charset parameters). Anyway, given the TAG's ongoing interest
>> in adherence to HTTP specifications in general, and sniffing in
>> particular, I thought I'd point these out.
>> Noah
>> [1]
>> [2]
> ...and
> The background is that HTML5 specifies an algorithm for extracting the
> charset from content type information, which (1) requires accepting
> invalid forms (single quotes), and (2) requires not to properly handle
> escapes in quoted strings.
> The spec claims it's needed for legacy content, but for both cases there
> are examples of UAs that do not implement this today; so that claim is
> really really weak.

Bugs 10804 and 10805 have been rejected, so I have raised issues and

Bug 9628 (which asks for clarification what the incompatibility with RFC 
2616 is) *has* been fixed, which is a good thing. I wish the spec did 
the same for all other "willful violations".

Best regards, Julian

Received on Thursday, 30 September 2010 12:13:49 UTC