i67: quoting charsets

<http://tools.ietf.org/wg/httpbis/trac/ticket/67>

Earlier related thread at:
http://lists.w3.org/Archives/Public/ietf-http-wg-old/2001JanApr/0020.html

Roy wrote in the issue;
> There is some confusion here. First, HTTP allows both quoted and  
> unquoted forms in Content-Type, and that certainly isn't going to  
> change. However, HTTP only uses the charset ABNF production in  
> Accept-Charset, and thus is currently defined to only allow tokens  
> in Accept-Charset.
>
> Should Accept-Charset allow charset quoted strings? I don't think  
> so. Should the charset production be removed to reduce the  
> confusion? Perhaps. This is really a design issue.
>
> This would be a lot easier if IANA kept a decent registry for  
> charset that only included the "MIME preferred names". We may need  
> to request that in the IANA considerations
>

p3 3.1 already says:
> HTTP uses charset in two contexts: within an Accept-Charset request  
> header (in which the charset value is an unquoted token) and as the  
> value of a parameter in a Content-Type header (within a request or  
> response), in which case the parameter value of the charset  
> parameter may be quoted.
I can't find this text in 2616, so I'm guessing that the editors took  
a stab at resolving this before flipping it to a design issue?

At any rate, the interesting thing to me here is that the argument for  
allowing quoted charset content-type parameters seems to be that the  
BNF for params is
token | quoted-string
i.e., all parameters inherit the ability to be quoted from the generic  
BNF.

However, as part of the discussion of our favourite issue, #74, we've  
come to the place where saying that field-content is *not* subject to  
RFC2047 encoding generically, even though its BNF refers to TEXT  
(albeit in comments).

I think we need to be more explicit about when a higher-level BNF  
rule's attributes (such as encoding and quoting) are inherited. This  
will help avoid a fair amount of reader confusion.

In this case, I'm fine with the added text above, but I think we also  
need to explicitly state that quoting in media-type parameters is  
syntactic, not semantic, and so both forms are equivalent (probably in  
p2 section 3.3) for any given parameter.

As far as accept-charset goes, I'm fine with leaving it just a token,  
and don't think we need any change there.


--
Mark Nottingham     http://www.mnot.net/

Received on Friday, 4 April 2008 05:55:59 UTC