Re: ISSUE-125 CCP -- change the "willful violation" note -- rev 1 from Leif Halvard Silli on 2011-01-27 (public-html@w3.org from January 2011)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Thu, 27 Jan 2011 19:27:46 +0100
To: Anne van Kesteren <annevk@opera.com>
Cc: Julian Reschke <julian.reschke@gmx.de>, "public-html@w3.org" <public-html@w3.org>
Message-ID: <20110127192746765166.08a02d80@xn--mlform-iua.no>

Anne van Kesteren, Thu, 27 Jan 2011 17:29:14 +0100:
> On Thu, 27 Jan 2011 15:04:36 +0100, Leif Halvard Silli wrote:
>> Anne van Kesteren, Thu, 27 Jan 2011 13:30:08 +0100:
>>> HTTP and the Media Type Sniffing specification define that.
>> 
>> But were does HTML5 points to those?
>> 
>> W.r.t. MIMESNIFF, then the section that we discuss, section '2.7.3
>> Determining the type of a resource', is the one which points to it.
>> This ection is also not only about 'text/html' but about any
>> 'resource'. Which, again, means that the Content-Type can only come
>> from HTTP.
> 
> Yes, if it comes from HTTP and has an encoding declared there the 
> algorithm under discussion will not be used.

I see that you are right: the encoding sniffing algorithm happens 
during the pre-parsing ... I misunderstood when it happens ...

The encoding sniffing algorithm is, as I mentioned, difficult to read - 
e.g. it makes it seem as if there is a step 3 - but also say that the 
wait-for-512-bytes in step 3 can happen "in this step or at any later 
step in this algorithm". Now I realize that the rest of that algorithm 
- namely step 4 and onwards - describes how to parse the result of that 
wait. (This is, of course, in principle only excuses for my failure to 
understand ...)

>> We agree that the algorithm is used twice in the encoding sniffing
>> algorithm. Then can you tell me when, according to you, the first of
>> those times are?
> 
> The first time is during the pre-parser-scan of the resource and the 
> second time is while parsing in case the encoding is still not 
> definitive.

OK. See. Perhaps there could have been a link from the words ]] Then, 
the real parser is started,[[ to section '8.2.3 Parse state' ...

>> ]] [...] meta element with a charset attribute or a meta element 
>> with an http-equiv attribute in the Encoding declaration state. [[

> You are drawing the wrong conclusion. It is perfectly fine to have 
> both HTTP Content-Type and a <meta charset>. What you quoted makes 
> limitations on the encoding if there is no Content-Type metadata, it 
> does not say anything else.

Here you misread my intention - I only meant that the author must make 
a choice between using meta@charset and using 
meta@http-quiv=content-type - quoting 4.2.5.5: ]]There can only be one 
character encoding declaration in the document.[[ Validator.nu doesn't 
give a damn whether you have multiple meta@charset or whatever - so 
can't trust Validator.nu in this case.

The point I wanted make, in that regard, was: why does HTML5 stress 
that meta@http-equiv=content-type is a encoding declaration, while 
section 2.7.3 - and several other places that I quoted - speaks about 
its content (if - again - you are right) as 'Content-Type metadata'?  
Either it is a content-type metadata or it is an encoding declaration.

>> Note that out-of-band can also be info from the file system - says
>> MIMESNIFF.
>> 
>> It is clear, to me, that HTML5's encoding sniffing algorithm overlaps
>> with things said in MIMESNIFF. Or would you say that those 512 bytes in
>> step 3 of HTML5's encoding sniffing algorithm refers to another stream
>> than the 512 bytes in MIMESNIFF? In that regard, MIMESNIFF states that
>> 
>> ]] For efficiency reasons, implementations might wish to implement this
>>    algorithm and the algorithm for detecting the character encoding of
>>    HTML documents in parallel. [[
> 
> What Media Type Sniffing does with those first 512 bytes is not 
> extracting the encoding [ snip ]

You are misunderstanding me if you think I thought otherwise.

>> In a summary: Can't see that you have proven that I have read the
>> spec(s) wrong.
> 
> I give up.

Well, thanks for trying. It had positive effect. I can now see that you 
seem to be right in claiming that the algorithm in section 2.7.3 talks 
about how to detect the pragma. Thus, your change proposal makes sense 
from that angle.

I believe it was HTML 2.0 which declared that http-equiv as a http 
equivalent. [1] Thus it is in the hands of the HTMLwg to change that. 
But I must ask: why meta@charset if you also want to make 
meta@http-equiv=Content-Type as similar to meta@charset as possible?

[1] http://tools.ietf.org/html/rfc1866#section-5.2.5

-- 
leif halvard silli

Received on Thursday, 27 January 2011 18:28:24 UTC