Re: CfC: Close ISSUE-148 charset-detect by Amicable Resolution

On Fri, 11 Mar 2011 13:37:12 +0100, Sam Ruby <rubys@intertwingly.net>  
wrote:

> On 02/09/2011 12:04 PM, Sam Ruby wrote:
>> The current status for this issue:
>>
>> http://www.w3.org/html/wg/tracker/issues/148
>> http://dev.w3.org/html5/status/issue-status.html#ISSUE-148
>>
>> - We have a change proposal to change the algorithm for extracting an
>> encoding from a Content-Type.
>>
>> At this time the Chairs would also like to solicit any other alternate
>> Change Proposals (possibly with "zero edits" as the Proposal Details),
>> in case anyone would like to advocate the status quo or a different
>> change than the specific one in the existing Change Proposals.
>>
>> If no counter-proposals or alternate proposals are received by March
>> 10th, 2011, we will proceed to evaluate the change proposal that we have
>> received to date.
>
> As we have received no counter-proposals or alternate proposals, the  
> chairs are issuing a call for consensus on the proposal that we do have.  
>   If no objections are raised to this call by March 18th 2011, we will  
> direct the editor to make the proposed change.  If anybody would like to  
> raise an objection during this time, we strongly encourage them to  
> accompany their objection with a concrete and complete change proposal.

I object to  
<http://lists.w3.org/Archives/Public/public-html/2011Jan/0431.html>. It  
contains 2 alternative proposals:

1. requiring "charset" to be preceeded by whitespace or a semicolon.

2. replacing the entire algorithm with a something that matches exactly  
the media-type production.

I object to both alternatives:

Alternative 1 doesn't achieve much, as the resulting algorithm would still  
not be close to matching the media-type production. The given rationale is  
"Trying shortcuts is dangerous, you may treat edge cases incorrectly", but  
what the practical danger is isn't specified.

Alternative 2 is virtually guaranteed to break compat with real web pages,  
as the following typos that are now tolerated would stop working:

text/html charset=UTF-8 (missing semicolon)
text/html: charset=UTF-8 (colon instead of semicolon)
text/html; charset = UTF-8 (whitespace between attribute and value)
text/html; charset=UTF-8; (trailing semicolon)
text/html;; charset=UTF-8 (double semicolon)
text\html; charset=UTF-8 (backslash instead of forward slash)
text/htmlcharset=UTF-8 (missing both semicolon and whitespace)


Counter Change Proposal:

Summary: Don't change the "algorithm for extracting an encoding from a  
Content-Type".

Rationale: Aligning it with HTTP will undoubtedly break existing content,  
since many trivial typos are not productions of the media-type syntax. Any  
other change that doesn't achieve spec purity or align the spec with  
implementations has very little or no practical benefit, but comes with a  
high risk of breaking existing content.

Details: No change.

Impact: Nope.

-- 
Philip Jägenstedt
Core Developer
Opera Software

Received on Friday, 11 March 2011 20:20:26 UTC