- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Thu, 30 Sep 2010 09:56:07 +0200
- To: Noah Mendelsohn <noah@arcanedomain.com>
- CC: "www-tag@w3.org" <www-tag@w3.org>
On 29.09.2010 23:29, Noah Mendelsohn wrote: > I notice that there is active discussion in two HTML5-related Bugzilla > entries [1,2] of details related to charset detection. I'm not up on the > details, but at least the title of [2] suggests that charset sniffing is > involved (to my untrained eye, most of the debate seems to be about > parsing of charset parameters). Anyway, given the TAG's ongoing interest > in adherence to HTTP specifications in general, and sniffing in > particular, I thought I'd point these out. > > Noah > > [1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=9628 > [2] http://www.w3.org/Bugs/Public/show_bug.cgi?id=10804 ...and http://www.w3.org/Bugs/Public/show_bug.cgi?id=10805 The background is that HTML5 specifies an algorithm for extracting the charset from content type information, which (1) requires accepting invalid forms (single quotes), and (2) requires not to properly handle escapes in quoted strings. The spec claims it's needed for legacy content, but for both cases there are examples of UAs that do not implement this today; so that claim is really really weak. Best regards, Julian
Received on Thursday, 30 September 2010 07:56:47 UTC