- From: Melman, Howard <Howard@silverstream.com>
- Date: Wed, 7 Feb 2001 11:46:05 -0500
- To: HTTP Working Group <http-wg@cuckoo.hpl.hp.com>
Is this legal: Content-Type: text/html; charset="iso-8859-1" Specifically are the double quotes around the charset value legal? I assume the intent is that they are, but I believe the spec as written doesn't allow for them. I know others (like WebDAV) assume you can use double quotes, and I know it's legal in MIME (see below) In RFC 2616 14.17 Content-Type refers you to 3.7 on media types. 3.7 defines media-type as: media-type = type "/" subtype *( ";" parameter ) and refers you to 3.6 to define parameter. 3.6 says: Parameters are in the form of attribute/value pairs. parameter = attribute "=" value attribute = token value = token | quoted-string so the values can be a token or a quoted-string, great, it seems that charset values can be quoted. BUT the last paragraph of 3.7.1 says: The "charset" parameter is used with some media types to define the character set (section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP. Data in character sets other than "ISO-8859-1" or its subsets MUST be labeled with an appropriate charset value. See section 3.4.1 for compatibility problems. Specifically referring us to section 3.4 for the definition of the charset parameter. 3.4 defines charset as: HTTP character sets are identified by case-insensitive tokens. The complete set of tokens is defined by the IANA Character Set registry [19]. charset = token And "token" doesn't allow quotes. Shouldn't this be: charset = token | quoted-string or else, doesn't the spec disallow quotes around charset values? Or should section 3.4 not offer a BNF for charset at all in which case it would be clear that it's just another parameter and therefore the value is token or quoted-string? Or, at least, section 3.4 should say that this BNF is semantic and that quotes around token are used to delimit the parameter (see below). If you're trying to figure out what the spec says for charset values, and you turn to section 3.4 since it defines charsets, in it's current form, you get a very different notion of what's allowed then I think is intended. Howard MIME's view of things, as best as I can find, is RFC 2045 section 5.1: Note that the value of a quoted string parameter does not include the quotes. That is, the quotation marks in a quoted-string are not a part of the value of the parameter, but are merely used to delimit that parameter value. In addition, comments are allowed in accordance with RFC 822 rules for structured header fields. Thus the following two forms Content-type: text/plain; charset=us-ascii (Plain text) Content-type: text/plain; charset="us-ascii" are completely equivalent.
Received on Wednesday, 7 February 2001 08:52:12 UTC