- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Thu, 27 Jan 2011 08:28:35 +0100
- To: Anne van Kesteren <annevk@opera.com>, Julian Reschke <julian.reschke@gmx.de>
- Cc: "public-html@w3.org" <public-html@w3.org>
Anne, HTML5's 'encoding sniffing algorithm' [1] uses the 'algorithm for extracting an encoding from a Content-Type' [2] twice: 1) before parsing: on Content-Type meta data (HTTP). [1] 2) during parsing: on meta element pragma in encoding declaration state (http-equiv=content-type). [3] Thus ISSUE-125 can't be isolated to HTTP-EQUIV unless the encoding sniffing algorithm [1] is changed. One could *make* what your CP builds on a reality by specifying 2 algorithms - one for HTTP-EQUIV and another one for HTTP. Until then, the combo 1 algorithm + HTTP non-violence requires EITHER a rewrite of the algorithm, as Julian suggested; OR solving the issue on the authoring requirements level; It seems to me that the OR option is what we currently have: Validator.nu screams if authors use HTTP-invalid syntax, despite what the algorithm accepts. Of course, the OR option is still a violation of HTTP ... Who does it help to interpret invalid charset names as if they were valid? I fail to see how anyone that was aware about his/her own deeds, would "willfully" use quotes around both sides of the charset name when inside a HTTP-EQUIV="Content-TYPE" element. In that regard, on Sun, 23 Jan 2011 15:36:35 +0100 you said: > I did not say that. What I said is that it makes sense to change HTTP > because double and single quotes can be used all over the Web > Platform interchangeably. Often though more lenient syntax is more > compatible and authors do not always test in IE. There are places 'Interchangeably' sounds nice. But are there any logics here? Where? With my limited knowledge of the HTTP spec and the rules for what characters a charset encoding names may contain, I do of course agree that it seems strange that encoding names can contain the single quote character. But then, we need to fix _that_ problem. I don't see how we fix that problem by keeping this algorithm: Even if we keep the algorithm you are fighting for, authors are still prohibited from using that syntax. So were is the interchangeability .. Btw, the 'encoding sniffing algorithm' [1] permits UAs to use 'information on the likely encoding for this page' etc, so such invalid encoding names could be used, at a later step in the encoding sniffing algorithm. [1] http://www.w3.org/TR/html5/parsing#determining-the-character-encoding [2] http://www.w3.org/TR/html5/fetching-resources#algorithm-for-extracting-an-encoding-from-a-content-type [3] http://www.w3.org/TR/html5/tokenization#meta-charset-during-parse Leif Halvard Silli Anne van Kesteren, Mon, 24 Jan 2011 16:50:13 +0100: > Summary: Change the note after "algorithm for extracting an encoding > from a Content-Type" to not mention HTTP as HTTP is not affected by > this algorithm. > > Rationale: "algorithm for extracting an encoding from a Content-Type" > is only used to examine the contents of a document and therefore does > not affect HTTP. Claiming it a willful violation of HTTP is > misleading. > > Details: Instead of saying this is a willful violation of HTTP say > this is a distinct algorithm from HTTP Content-Type processing for > usage outside of HTTP. > > Impact: Hardly. > > Anne van Kesteren > http://annevankesteren.nl/ -- leif halvard silli
Received on Thursday, 27 January 2011 07:29:10 UTC