- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Thu, 27 Jan 2011 15:04:36 +0100
- To: Anne van Kesteren <annevk@opera.com>
- Cc: Julian Reschke <julian.reschke@gmx.de>, "public-html@w3.org" <public-html@w3.org>
Anne van Kesteren, Thu, 27 Jan 2011 13:30:08 +0100: > On Thu, 27 Jan 2011 13:25:03 +0100, Leif Halvard Silli wrote: >> If you are correct, then where does HTML5 specify how to handle the >> HTTP Content-Type header? > > HTTP and the Media Type Sniffing specification define that. But were does HTML5 points to those? W.r.t. MIMESNIFF, then the section that we discuss, section '2.7.3 Determining the type of a resource', is the one which points to it. This ection is also not only about 'text/html' but about any 'resource'. Which, again, means that the Content-Type can only come from HTTP. And, I repeat, that if if the UA is configured to 'strictly obey' - as MIMESNIFF calls it, then the HTTP headers, then there will be no sniffing. We agree that the algorithm is used twice in the encoding sniffing algorithm. Then can you tell me when, according to you, the first of those times are? And why would it read the http-equiv twice? According to myself, the first time happens *before* the encoding sniffing algorithm starts running - the algorithm merely "listens" to what the result from Content-Type were: [2] ]] 2. If the transport layer specifies an encoding, and it is supported, return that encoding with the confidence certain, and abort these steps. [[ Or as that section also states: ]] This algorithm takes as input any out-of-band metadata available to the user agent (e.g. the Content-Type metadata of the document) and all the bytes available so far, and returns an encoding and a confidence. [[ Note also that the http-equiv pragma, per HTML5 is not 'content-type metadata' but an encoding declaration. [3] The encoding declaration section states that: [4] ]] If an HTML document does not start with a BOM, and if its encoding is not explicitly given by Content-Type metadata, and the document is not an iframe srcdoc document, then the character encoding used must be an ASCII-compatible character encoding, and, in addition, if that encoding isn't US-ASCII itself, then the encoding must be specified using a meta element with a charset attribute or a meta element with an http-equiv attribute in the Encoding declaration state. [[ Thus, the encoding declaration - in form of meta@charst or metea@http-equiv=content-type - is only used when there isn't a BOM or when the Content-Type meta data (which, again, is described in [1]), does not provide confident encoding information. Note that out-of-band can also be info from the file system - says MIMESNIFF. It is clear, to me, that HTML5's encoding sniffing algorithm overlaps with things said in MIMESNIFF. Or would you say that those 512 bytes in step 3 of HTML5's encoding sniffing algorithm refers to another stream than the 512 bytes in MIMESNIFF? In that regard, MIMESNIFF states that ]] For efficiency reasons, implementations might wish to implement this algorithm and the algorithm for detecting the character encoding of HTML documents in parallel. [[ In a summary: Can't see that you have proven that I have read the spec(s) wrong. [1] http://www.w3.org/TR/html5/fetching-resources.html#content-type [2] http://www.w3.org/TR/html5/parsing#encoding-sniffing-algorithm [3] http://www.w3.org/TR/html5/semantics#attr-meta-http-equiv-content-type [4] http://www.w3.org/TR/html5/semantics#character-encoding-declaration -- leif halvard silli
Received on Thursday, 27 January 2011 14:05:11 UTC