- From: Anne van Kesteren <annevk@opera.com>
- Date: Thu, 27 Jan 2011 17:29:14 +0100
- To: "Leif Halvard Silli" <xn--mlform-iua@xn--mlform-iua.no>
- Cc: "Julian Reschke" <julian.reschke@gmx.de>, "public-html@w3.org" <public-html@w3.org>
On Thu, 27 Jan 2011 15:04:36 +0100, Leif Halvard Silli <xn--mlform-iua@målform.no> wrote: > Anne van Kesteren, Thu, 27 Jan 2011 13:30:08 +0100: >> HTTP and the Media Type Sniffing specification define that. > > But were does HTML5 points to those? > > W.r.t. MIMESNIFF, then the section that we discuss, section '2.7.3 > Determining the type of a resource', is the one which points to it. > This ection is also not only about 'text/html' but about any > 'resource'. Which, again, means that the Content-Type can only come > from HTTP. Yes, if it comes from HTTP and has an encoding declared there the algorithm under discussion will not be used. > And, I repeat, that if if the UA is configured to 'strictly obey' - as > MIMESNIFF calls it, then the HTTP headers, then there will be no > sniffing. > > We agree that the algorithm is used twice in the encoding sniffing > algorithm. Then can you tell me when, according to you, the first of > those times are? The first time is during the pre-parser-scan of the resource and the second time is while parsing in case the encoding is still not definitive. > And why would it read the http-equiv twice? According > to myself, the first time happens *before* the encoding sniffing > algorithm starts running - the algorithm merely "listens" to what the > result from Content-Type were: [2] > > ]] 2. If the transport layer specifies an encoding, and it is > supported, return that encoding with the confidence certain, and abort > these steps. [[ > > Or as that section also states: ]] This algorithm takes as input any > out-of-band metadata available to the user agent (e.g. the Content-Type > metadata of the document) and all the bytes available so far, and > returns an encoding and a confidence. [[ That is a different algorithm from the one under discussion. > Note also that the http-equiv pragma, per HTML5 is not 'content-type > metadata' but an encoding declaration. [3] The encoding declaration > section states that: [4] > > ]] If an HTML document does not start with a BOM, and if its encoding > is not explicitly given by Content-Type metadata, and the document is > not an iframe srcdoc document, then the character encoding used must be > an ASCII-compatible character encoding, and, in addition, if that > encoding isn't US-ASCII itself, then the encoding must be specified > using a meta element with a charset attribute or a meta element with an > http-equiv attribute in the Encoding declaration state. [[ > > Thus, the encoding declaration - in form of meta@charst or > metea@http-equiv=content-type - is only used when there isn't a BOM or > when the Content-Type meta data (which, again, is described in [1]), > does not provide confident encoding information. You are drawing the wrong conclusion. It is perfectly fine to have both HTTP Content-Type and a <meta charset>. What you quoted makes limitations on the encoding if there is no Content-Type metadata, it does not say anything else. > Note that out-of-band can also be info from the file system - says > MIMESNIFF. > > It is clear, to me, that HTML5's encoding sniffing algorithm overlaps > with things said in MIMESNIFF. Or would you say that those 512 bytes in > step 3 of HTML5's encoding sniffing algorithm refers to another stream > than the 512 bytes in MIMESNIFF? In that regard, MIMESNIFF states that > > ]] For efficiency reasons, implementations might wish to implement this > algorithm and the algorithm for detecting the character encoding of > HTML documents in parallel. [[ What Media Type Sniffing does with those first 512 bytes is not extracting the encoding but determining the type of the resource. Determining the correct Content-Type header for the resource happens elsewhere. You are confusing algorithms. > In a summary: Can't see that you have proven that I have read the > spec(s) wrong. I give up. > [1] http://www.w3.org/TR/html5/fetching-resources.html#content-type > [2] http://www.w3.org/TR/html5/parsing#encoding-sniffing-algorithm > [3] > http://www.w3.org/TR/html5/semantics#attr-meta-http-equiv-content-type > [4] http://www.w3.org/TR/html5/semantics#character-encoding-declaration -- Anne van Kesteren http://annevankesteren.nl/
Received on Thursday, 27 January 2011 16:29:49 UTC