Re: ISSUE-126: charset-vs-backslashes - Straw Poll for Objections from Julian Reschke on 2011-03-05 (public-html@w3.org from March 2011)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Sat, 05 Mar 2011 21:27:15 +0100
To: Sam Ruby <rubys@intertwingly.net>
CC: HTML WG <public-html@w3.org>
Message-ID: <4D729CA3.5000009@gmx.de>
On 02.03.2011 18:31, Sam Ruby wrote:
> ISSUE-126: charset-vs-quotes - Straw Poll for Objections
>
> The poll is available here and it will run through Thursday Mar 10th:
>
> http://www.w3.org/2002/09/wbs/40318/issue-126-objection-poll/
>
> Please read the introductory text before entering your response.
>
> In particular, keep in mind that you don't *have* to reply. You only
> need to do so if you feel your objection to one of the options is truly
> strong, and has not been adequately addressed by a clearly marked
> objection contained within a Change Proposal or by someone else's
> objection. The Chairs will be looking at strength of objections, and
> will not be counting votes.

Here are a few comments on Philip's feedback in 
<http://www.w3.org/2002/09/wbs/40318/issue-126-objection-poll/results>:

> The proposal aims to align processing with the HTTP spec in order to remove a willfull violation, but does not achieve that, even assuming that the sibling proposal for ISSUE-125 is adopted.
>
> The "algorithm for extracting an encoding from a Content-Type" should be applied to the value of the content="" attribute on <meta http-equiv="Content-Type">. In order to claim conformance with HTTP, that value should be processed like the media-type production in RFC 2616:
>
> media-type = type "/" subtype *( ";" parameter )
> type = token
> subtype = token
>
> parameter = attribute "=" value
> attribute = token
> value = token | quoted-string
>
> quoted-string = ( <"> *(qdtext | quoted-pair ) <"> )
> qdtext = <any TEXT except <">>
> quoted-pair = "\" CHAR
>
> The critical part of the suggested change is "Return the encoding corresponding to the backslash-unescaped string between this characters and the next earliest occurrence of this character." This is more liberal than the quoted-string production, allowing e.g. content='text/html;charset="UTF-8"garbage'.

Indeed; see my mail from 
<http://lists.w3.org/Archives/Public/public-html/2011Jan/0358.html>. It 
may have been a bad decision to put this into three ISSUEs; this is 
mainly a result of Ian refusing to look at Bugzilla entries that 
describe multiple, related problems.

Writing CPs for each of these while the others are in progress makes 
things hard.

> Furthermore, earlier steps of the algorithm are nowhere near close to the HTTP spec, simply finding the first occurence of "charset", allowing e.g. content='garbagecharset=UTF-8'.

I believe this is ISSUE-148.

> Only if the algorithm as a whole matches exactly the media-type production will the spec not require "recipients to parse Content-Type headers in <meta> elements in a way breaking HTTP's parsing rules." Since the change proposal does not achieve that, I object to its adoption.

Again, it's a process problem that we're looking at three issues at the 
same time.

The bug was originally raised because the spec claims that the described 
behavior was needed for compatibility with "existing content". This has 
been proven to be nonsense, or minimally an exaggeration.

If we follow Anne's proposal for ISSUE-125 we'll at least have spec text 
that simply states that parsing of meta tag values is different from 
HTTP header field values, which is an improvement. We can then focus on 
deciding *which* of all of these differences make sense/are "required".

Best regards, Julian
Received on Saturday, 5 March 2011 20:28:15 UTC