Re: ISSUE-126: charset-vs-backslashes - Straw Poll for Objections

On Sun, 06 Mar 2011 11:46:04 +0100, Julian Reschke <julian.reschke@gmx.de>  
wrote:

> On 05.03.2011 22:41, Philip Jägenstedt wrote:
>> ...
>>>> Furthermore, earlier steps of the algorithm are nowhere near close to
>>>> the HTTP spec, simply finding the first occurence of "charset",
>>>> allowing e.g. content='garbagecharset=UTF-8'.
>>>
>>> I believe this is ISSUE-148.
>>>
>>>> Only if the algorithm as a whole matches exactly the media-type
>>>> production will the spec not require "recipients to parse
>>>> Content-Type headers in <meta> elements in a way breaking HTTP's
>>>> parsing rules." Since the change proposal does not achieve that, I
>>>> object to its adoption.
>>>
>>> Again, it's a process problem that we're looking at three issues at
>>> the same time.
>>
>> OK, I wasn't aware that there was a third issue as well. Would it be
>> fair to simply treat the sum of your proposals as a single proposal that
>> causes the content="" attribute value to be parsed as per the media-type
>> production?
>> ...
>
> My goals would be:
>
> - either align parsing with HTTP; *or* be clear that this is specific to  
> META, and consumers will need different parsing rules for the two  
> protocol elements.
>
> - in the latter case, rephrase and possibly move the text we're  
> discussing so it becomes crystal clear that this is error handling, and  
> *only* applies to <meta>.
>
> - make sure that field values that are syntactically valid in HTTP and  
> conforming in HTML have the same interpretation.
>
> - clarify how the two sets described above differ (for instance, if  
> backslash doesn't do the same thing as in quoted-string it should be  
> profiled out in HTML, this may already be the case).

All of this seems reasonable, if done with restraint. For example, I don't  
think there's any point in handling backslash escaping, as no encoding  
names include characters that need escaping, right?

> - get rid of claims that things are done for backwards compatibility  
> when we have proof this is not the case.

Have you done testing of the sum of the changes necessary to make  
processing comply exactly with HTTP? It's plausible that the impact of  
backslash escaping and quote style is limited, but I find it very hard to  
believe that changing the way the charset parameter is located to follow  
HTTP would not have legacy compat issues.

> BTW:
>
> content='text/html; charset = UTF-8' (whitespace between attribute and  
> value)
>
> is syntactically legal per RFC 2616 (although we may have broken it in  
> HTTPbis, just opened a ticket).

Perhaps I'm misreading <http://tools.ietf.org/html/rfc2616#section-3.7>?  
The ABNF does not allow for it and the prose says "Linear white space  
(LWS) MUST NOT be used between the type and subtype, nor between an  
attribute and its value."

-- 
Philip Jägenstedt
Core Developer
Opera Software

Received on Sunday, 6 March 2011 16:20:02 UTC