Revised Change Proposal for ISSUE-125, was: ISSUE-125 charset-vs-quotes - Chairs Solicit Alternate Proposals or Counter-Proposals from Julian Reschke on 2011-01-23 (public-html@w3.org from January 2011)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Sun, 23 Jan 2011 18:42:55 +0100
To: Sam Ruby <rubys@intertwingly.net>
CC: HTML WG <public-html@w3.org>
Message-ID: <4D3C689F.5000801@gmx.de>
On 01.12.2010 13:35, Sam Ruby wrote:
> The current status for this issue:
>
> http://www.w3.org/html/wg/tracker/issues/125
> http://dev.w3.org/html5/status/issue-status.html#ISSUE-125
>
> We have a single change proposal to parse quotes in Content-Type headers
> in <meta> elements in a HTTP compliant manner:
>
> http://lists.w3.org/Archives/Public/public-html/2010Nov/0233.html
>
> At this time the Chairs would also like to solicit alternate Change
> Proposals (possibly with "zero edits" as the Proposal Details), in case
> anyone would like to advocate the status quo or a different change than
> the specific one in the existing Change Proposals.
>
> If no counter-proposals or alternate proposals are received by January
> 12th, 2011, we will proceed to evaluate the change proposal that we have
> received to date.
> ...

Hi,

below is a revised CP for ISSUE-125; the changes are:

a) It explains that a single quote is indeed a valid character inside 
the HTTP parameter grammar, and thus

b) Rephrases the proposed text to treat the single quote like any other 
character in the unquoted form (thus removing the special case that was 
in there).

c) Mentions that this enables UAs to more consistently parse things that 
use the HTTP parameter ABNF.

Note this changes slightly handling of charset names containing a single 
quote -- previously, the proposed change would drop them on the floor, 
now it will successfully parse them, but the resulting value will never 
be valid (as valid charset names never contain single quotes).

Best regards, Julian

-- snip --
SUMMARY

The specification requires recipients to parse Content-Type headers in 
<meta> elements in a way breaking HTTP's parsing rules.

The justification given is:

   "Note: This requirement is a willful violation of the HTTP 
specification (for example, HTTP doesn't allow the use of single quotes 
and requires supporting a backslash-escape mechanism that is not 
supported by this algorithm), motivated by the need for backwards 
compatibility with legacy content."

...however tests show that Internet Explorer ([1]) does indeed obey the 
HTTP parsing rules, so it's highly doubtful that it's actually needed 
for "backwards compatibility".

RATIONALE

"Willful violations" should be restricted to cases where they are 
actually needed in practice. Evidence shows this is not the case here.

Further note that HTTP *does* allow single quotes; however they are not 
treated as delimiters but simply allowed characters inside the token 
(non-quoted) form:

     token          = 1*<any CHAR except CTLs or separators>
     separators     = "(" | ")" | "<" | ">" | "@"
                    | "," | ";" | ":" | "\" | <">
                    | "/" | "[" | "]" | "?" | "="
                    | "{" | "}" | SP | HT

(see [2], or [3] for the ABNF in the current HTTPbis draft).

DETAILS

Change Step 6 in the last part of 
<http://dev.w3.org/html5/spec/Overview.html#content-type-sniffing> from:

-- cut --
    6.
       Process the next character as follows:

       If it is a U+0022 QUOTATION MARK ('"') and there is a later 
U+0022 QUOTATION MARK ('"') in s
       If it is a U+0027 APOSTROPHE ("'") and there is a later U+0027 
APOSTROPHE ("'") in s
           Return the encoding corresponding to the string between this 
character and the next earliest occurrence of this character.
       If it is an unmatched U+0022 QUOTATION MARK ('"')
       If it is an unmatched U+0027 APOSTROPHE ("'")
       If there is no next character
           Return nothing.
       Otherwise
           Return the encoding corresponding to the string from this 
character to the first U+0009, U+000A, U+000C, U+000D, U+0020, or U+003B 
character or the end of s, whichever comes first.
-- cut --

to

-- cut --
    6.
       Process the next character as follows:

       If it is a U+0022 QUOTATION MARK ('"') and there is a later 
U+0022 QUOTATION MARK ('"') in s
           Return the encoding corresponding to the string between this 
character and the next earliest occurrence of this character.
       If it is an unmatched U+0027 APOSTROPHE ("'")
       If there is no next character
           Return nothing.
       Otherwise
           Return the encoding corresponding to the string from this 
character to the first U+0009, U+000A, U+000C, U+000D, U+0020, or U+003B 
character or the end of s, whichever comes first.
-- cut --

...and change the following note accordingly (the exact text for the 
note depending on the decision for ISSUE-126).

IMPACT

1. Positive Effects

Removal of a "willful violation" that is not required at all.

UAs can use consistent parsing rules for things that are specified to 
use the HTTP parameter ABNF.

No need to change IE's behavior; the notoriously hard to get-rid-of 
legacy IE versions remain compliant.

2. Negative Effects

Non-IE UAs may have to change if they want to be compliant in handling 
essentially invalid header field instances (a single quote never is part 
of a charset name).

3. Conformance Classes Changes

Certain instances of meta/@http-equiv change their semantics.

4. Risks

The risk appears to be small, given the fact that IE already behaves the 
way this Change Proposal describes.


REFERENCES

[1] <http://www.w3.org/Bugs/Public/show_bug.cgi?id=10805#c0>
[2] <http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.2.2>
[3] 
<http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-12.html#rfc.section.1.2.2>
Received on Sunday, 23 January 2011 17:43:40 UTC