- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Thu, 27 Jan 2011 08:28:35 +0100
- To: Anne van Kesteren <annevk@opera.com>, Julian Reschke <julian.reschke@gmx.de>
- Cc: "public-html@w3.org" <public-html@w3.org>
Anne, HTML5's 'encoding sniffing algorithm' [1] uses the 'algorithm for
extracting an encoding from a Content-Type' [2] twice:
1) before parsing: on Content-Type meta data (HTTP). [1]
2) during parsing: on meta element pragma in encoding declaration
state (http-equiv=content-type). [3]
Thus ISSUE-125 can't be isolated to HTTP-EQUIV unless the encoding
sniffing algorithm [1] is changed. One could *make* what your CP builds
on a reality by specifying 2 algorithms - one for HTTP-EQUIV and
another one for HTTP. Until then, the combo 1 algorithm + HTTP
non-violence requires
EITHER a rewrite of the algorithm, as Julian suggested;
OR solving the issue on the authoring requirements level;
It seems to me that the OR option is what we currently have:
Validator.nu screams if authors use HTTP-invalid syntax, despite what
the algorithm accepts.
Of course, the OR option is still a violation of HTTP ... Who does it
help to interpret invalid charset names as if they were valid? I fail
to see how anyone that was aware about his/her own deeds, would
"willfully" use quotes around both sides of the charset name when
inside a HTTP-EQUIV="Content-TYPE" element.
In that regard, on Sun, 23 Jan 2011 15:36:35 +0100 you said:
> I did not say that. What I said is that it makes sense to change HTTP
> because double and single quotes can be used all over the Web
> Platform interchangeably. Often though more lenient syntax is more
> compatible and authors do not always test in IE. There are places
'Interchangeably' sounds nice. But are there any logics here? Where?
With my limited knowledge of the HTTP spec and the rules for what
characters a charset encoding names may contain, I do of course agree
that it seems strange that encoding names can contain the single quote
character. But then, we need to fix _that_ problem. I don't see how we
fix that problem by keeping this algorithm: Even if we keep the
algorithm you are fighting for, authors are still prohibited from using
that syntax. So were is the interchangeability ..
Btw, the 'encoding sniffing algorithm' [1] permits UAs to use
'information on the likely encoding for this page' etc, so such invalid
encoding names could be used, at a later step in the encoding sniffing
algorithm.
[1]
http://www.w3.org/TR/html5/parsing#determining-the-character-encoding
[2]
http://www.w3.org/TR/html5/fetching-resources#algorithm-for-extracting-an-encoding-from-a-content-type
[3] http://www.w3.org/TR/html5/tokenization#meta-charset-during-parse
Leif Halvard Silli
Anne van Kesteren, Mon, 24 Jan 2011 16:50:13 +0100:
> Summary: Change the note after "algorithm for extracting an encoding
> from a Content-Type" to not mention HTTP as HTTP is not affected by
> this algorithm.
>
> Rationale: "algorithm for extracting an encoding from a Content-Type"
> is only used to examine the contents of a document and therefore does
> not affect HTTP. Claiming it a willful violation of HTTP is
> misleading.
>
> Details: Instead of saying this is a willful violation of HTTP say
> this is a distinct algorithm from HTTP Content-Type processing for
> usage outside of HTTP.
>
> Impact: Hardly.
>
> Anne van Kesteren
> http://annevankesteren.nl/
--
leif halvard silli
Received on Thursday, 27 January 2011 07:29:10 UTC