Re: Change Proposal for ISSUE-125

On Nov 14, 2010, at 3:43 AM, Julian Reschke wrote:

> On 14.11.2010 05:15, Jonas Sicking wrote:
>> ...
>> The best solution to a whole group of problems here is IMHO to define
>> that<meta http-equiv>  has no relation to HTTP headers at all. Any and
>> all similarities with http and http headers is a historical artifact.
>> ...
> 
> If that's what we think, we should clearly say that.

This particular algorithm is even more limited in scope than most http-equiv values, since (according to the backwards cross-reference tool) it is only used by the parser to determine the document's character set encoding. That code is historically quite different from actual Content-Type parsing code.

> 
> That would mean clarifying that the section is *only* about meta/@http-equiv, and clearly state that *because* it's not about the HTTP header field the parsing rules can vary.

If such a statement was added, would you consider that sufficient to resolve this issue and ISSUE-126 by amicable resolution?

> 
> That being said: even if we do that it would be good to reduce *unnecessary* deviations. For instance, it's totally not clear why "foocharset" is parsed as "charset", while "charsetfoo" is not (<http://www.w3.org/Bugs/Public/show_bug.cgi?id=9628#c3>).

It seems like that is a separate concern from the two issues currently under discussion. That being said, I believe older (pre-HTML5 parser) browsers generally work that way. When detecting the encoding, once they see "<meta", pre-HTML5 browsers just scan forward to find "charset=" before hitting ">". That's somewhat oversimplified, but a decent first-order approzimation. From that model, you can see why foocharset would be detected and charsetfoo would not. This same looseness is what makes HTML5's simplified charset syntax (<meta charset=utf8>) work in current browsers.

If any case, if we want to take up this detail further, it should be via a separate bug/issue.

Regards,
Maciej

Received on Sunday, 14 November 2010 16:21:07 UTC