Re: HTML5 discussions regarding charset determination and sniffing

Julian Reschke writes:

 > The background is that HTML5 specifies an algorithm for extracting the
 > charset from content type information, which (1) requires accepting invalid
 > forms (single quotes), and (2) requires not to properly handle escapes in
 > quoted strings.

Thank you for the very helpful clarification.  I agree that these "willfull 
violations" are significant, and should be minimized to the extent 
practical.  There is a big grey area between "sniffing" and silently 
recovering from syntactic or other errors in headers.  This seems more 
toward the latter:  allowing single quotes where double is required is a 
different sort of "being liberal" than looking at something labeled 
text/plain and determining "aha, you meant image/jpeg".  Thanks!

Noah

On 9/30/2010 3:56 AM, Julian Reschke wrote:
> On 29.09.2010 23:29, Noah Mendelsohn wrote:
>> I notice that there is active discussion in two HTML5-related Bugzilla
>> entries [1,2] of details related to charset detection. I'm not up on the
>> details, but at least the title of [2] suggests that charset sniffing is
>> involved (to my untrained eye, most of the debate seems to be about
>> parsing of charset parameters). Anyway, given the TAG's ongoing interest
>> in adherence to HTTP specifications in general, and sniffing in
>> particular, I thought I'd point these out.
>>
>> Noah
>>
>> [1] http://www.w3.org/Bugs/Public/show_bug.cgi?id=9628
>> [2] http://www.w3.org/Bugs/Public/show_bug.cgi?id=10804
>
> ...and
>
> http://www.w3.org/Bugs/Public/show_bug.cgi?id=10805
>
> The background is that HTML5 specifies an algorithm for extracting the
> charset from content type information, which (1) requires accepting invalid
> forms (single quotes), and (2) requires not to properly handle escapes in
> quoted strings.
>
> The spec claims it's needed for legacy content, but for both cases there
> are examples of UAs that do not implement this today; so that claim is
> really really weak.
>
> Best regards, Julian
>
>

Received on Thursday, 30 September 2010 14:01:39 UTC