- From: Joris Dobbelsteen <joris.dobbelsteen@mail.com>
- Date: Wed, 7 Feb 2001 21:59:01 +0100
- To: "'Melman, Howard'" <Howard@silverstream.com>
- Cc: "WWW WG (E-mail)" <http-wg@cuckoo.hpl.hp.com>
Interresting problem >-----Original Message----- >From: Melman, Howard [mailto:Howard@silverstream.com] >Sent: Wednesday, 07 February 2001 17:46 >To: HTTP Working Group >Subject: can charsets be quoted. > > > >Is this legal: > > Content-Type: text/html; charset="iso-8859-1" > >Specifically are the double quotes around the charset value >legal? I assume the intent is that they are, but I believe >the spec as written doesn't allow for them. I know others >(like WebDAV) assume you can use double quotes, and I know >it's legal in MIME (see below) > >In RFC 2616 14.17 Content-Type refers you to 3.7 on media >types. 3.7 defines media-type as: > > media-type = type "/" subtype *( ";" parameter ) > >and refers you to 3.6 to define parameter. 3.6 says: > > Parameters are in the form of attribute/value pairs. > > parameter = attribute "=" value > attribute = token > value = token | quoted-string > Till here it seems to be all right.... >so the values can be a token or a quoted-string, great, it >seems that charset values can be quoted. BUT the last >paragraph of 3.7.1 says: > > The "charset" parameter is used with some media types to define the > character set (section 3.4) of the data. When no explicit charset > parameter is provided by the sender, media subtypes of the "text" > type are defined to have a default charset value of >"ISO-8859-1" when > received via HTTP. Data in character sets other than "ISO-8859-1" or > its subsets MUST be labeled with an appropriate charset value. See > section 3.4.1 for compatibility problems. > >Specifically referring us to section 3.4 for the definition >of the charset parameter. 3.4 defines charset as: > > HTTP character sets are identified by case-insensitive tokens. The > complete set of tokens is defined by the IANA Character Set registry > [19]. > > charset = token > >And "token" doesn't allow quotes. Shouldn't this be: > > charset = token | quoted-string Well, it doesn't point explicitly to the value, thus: value = charset | token | quoted-string Something like this would then have been in the spec I expect is to be all right what you do. > >or else, doesn't the spec disallow quotes around charset >values? Or should section 3.4 not offer a BNF for charset >at all in which case it would be clear that it's just >another parameter and therefore the value is token or >quoted-string? Or, at least, section 3.4 should say that >this BNF is semantic and that quotes around token are used >to delimit the parameter (see below). > >If you're trying to figure out what the spec says for >charset values, and you turn to section 3.4 since it defines >charsets, in it's current form, you get a very different >notion of what's allowed then I think is intended. > >Howard > > >MIME's view of things, as best as I can find, is RFC 2045 section 5.1: > > Note that the value of a quoted string parameter does not >include the > quotes. That is, the quotation marks in a quoted-string are not a > part of the value of the parameter, but are merely used to delimit > that parameter value. In addition, comments are allowed in > accordance with RFC 822 rules for structured header fields. > Thus the > following two forms > > Content-type: text/plain; charset=us-ascii (Plain text) > > Content-type: text/plain; charset="us-ascii" > > are completely equivalent. > HTTP has much of it's design from MIME, probably you can use the quoted-string, and it's compliant withe spec. However, I don't know if client implementation support it, but I expect they will, through I'm not sure, nor have any possibility to test this. This is actually the issue with things like this. The only server I found: HEAD http://www.freebsd.com/ HTTP/1.1 returned the value of the parameter "charset" without quotes. I would recommend to simply not use them, just in case... - Joris
Received on Wednesday, 7 February 2001 13:10:11 UTC