Change Proposal for ISSUE-126 from Julian Reschke on 2010-11-13 (public-html@w3.org from November 2010)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Sat, 13 Nov 2010 18:55:36 +0100
To: "public-html@w3.org" <public-html@w3.org>
Message-ID: <4CDED118.9000605@gmx.de>
SUMMARY

The specification requires recipients to parse Content-Type headers in 
<meta> elements in a way breaking HTTP's parsing rules.

The justification given is:

   "Note: This requirement is a willful violation of the HTTP 
specification (for example, HTTP doesn't allow the use of single quotes 
and requires supporting a backslash-escape mechanism that is not 
supported by this algorithm), motivated by the need for backwards 
compatibility with legacy content."

...however tests show that Opera, Safari and Konqueror ([1]) do not 
implement the HTML5 parsing rule, so it's highly doubtful that it's 
actually needed for "backwards compatibility".

RATIONALE

"Willful violations" should be restricted to cases where they are 
actually needed in practice. Evidence shows this is not the case here.

DETAILS

Change Step 6 in the last part of 
<http://dev.w3.org/html5/spec/Overview.html#content-type-sniffing> from:

-- cut --
    6.
       Process the next character as follows:

       If it is a U+0022 QUOTATION MARK ('"') and there is a later 
U+0022 QUOTATION MARK ('"') in s
       If it is a U+0027 APOSTROPHE ("'") and there is a later U+0027 
APOSTROPHE ("'") in s
           Return the encoding corresponding to the string between this 
character and the next earliest occurrence of this character.
       If it is an unmatched U+0022 QUOTATION MARK ('"')
       If it is an unmatched U+0027 APOSTROPHE ("'")
       If there is no next character
           Return nothing.
       Otherwise
           Return the encoding corresponding to the string from this 
character to the first U+0009, U+000A, U+000C, U+000D, U+0020, or U+003B 
character or the end of s, whichever comes first.
-- cut --

to

-- cut --
    6.
       Process the next character as follows:

       If it is a U+0022 QUOTATION MARK ('"') and there is a later 
U+0022 QUOTATION MARK ('"') (NOT immediately following an U+005C REVERSE 
SOLIDUS ("\") character) in s
       If it is a U+0027 APOSTROPHE ("'") and there is a later U+0027 
APOSTROPHE ("'") in s
           Return the encoding corresponding to the backslash-unescaped 
string between this characters and the next earliest occurrence of this 
character.
       If it is an unmatched U+0022 QUOTATION MARK ('"')
       If it is an unmatched U+0027 APOSTROPHE ("'")
       If there is no next character
           Return nothing.
       Otherwise
           Return the encoding corresponding to the string from this 
character to the first U+0009, U+000A, U+000C, U+000D, U+0020, or U+003B 
character or the end of s, whichever comes first.
-- cut --

...and define somewhere near...:

-- cut --
"backslash-unescaping" a string replaces each sequence of U+005C REVERSE 
SOLIDUS ("\") and the following character by just that character. If the 
last  character of the string is a U+005C REVERSE SOLIDUS ("\"), the 
algorithm returns nothing.
-- cut --

...and change the following note accordingly (the exact text for the 
note depending on the decision for ISSUE-125).

IMPACT

1. Positive Effects

Removal of a "willful violation" that is not required at all.

2. Negative Effects

UAs may have to change; they will however likely benefit from being able 
to apply consistent parsing rules, reducing the number of special cases.

3. Conformance Classes Changes

Certain instances of meta/@http-equiv change their semantics.

4. Risks

The risk appears to be small, there's no point in using escapes for 
character set names anyway.


REFERENCES

[1] <http://www.w3.org/Bugs/Public/show_bug.cgi?id=10806#c0>
Received on Saturday, 13 November 2010 17:56:22 UTC