RE: can charsets be quoted.

Interresting problem

>-----Original Message-----
>From: Melman, Howard [mailto:Howard@silverstream.com]
>Sent: Wednesday, 07 February 2001 17:46
>To: HTTP Working Group
>Subject: can charsets be quoted.
>
>
>
>Is this legal:
>
>    Content-Type: text/html; charset="iso-8859-1"
>
>Specifically are the double quotes around the charset value
>legal?  I assume the intent is that they are, but I believe
>the spec as written doesn't allow for them.  I know others
>(like WebDAV) assume you can use double quotes, and I know
>it's legal in MIME (see below)
>
>In RFC 2616 14.17 Content-Type refers you to 3.7 on media
>types.  3.7 defines media-type as:
>
>       media-type     = type "/" subtype *( ";" parameter )
>
>and refers you to 3.6 to define parameter.  3.6 says:
>
>   Parameters are in  the form of attribute/value pairs.
>
>       parameter               = attribute "=" value
>       attribute               = token
>       value                   = token | quoted-string
>

Till here it seems to be all right....


>so the values can be a token or a quoted-string, great, it
>seems that charset values can be quoted.  BUT the last
>paragraph of 3.7.1 says:
>
>   The "charset" parameter is used with some media types to define the
>   character set (section 3.4) of the data. When no explicit charset
>   parameter is provided by the sender, media subtypes of the "text"
>   type are defined to have a default charset value of
>"ISO-8859-1" when
>   received via HTTP. Data in character sets other than "ISO-8859-1" or
>   its subsets MUST be labeled with an appropriate charset value. See
>   section 3.4.1 for compatibility problems.
>
>Specifically referring us to section 3.4 for the definition
>of the charset parameter.  3.4 defines charset as:
>
>   HTTP character sets are identified by case-insensitive tokens. The
>   complete set of tokens is defined by the IANA Character Set registry
>   [19].
>
>       charset = token
>
>And "token" doesn't allow quotes.  Shouldn't this be:
>
>       charset = token | quoted-string

Well, it doesn't point explicitly to the value, thus:
  value = charset | token | quoted-string

Something like this would then have been in the spec

I expect is to be all right what you do.

>
>or else, doesn't the spec disallow quotes around charset
>values?  Or should section 3.4 not offer a BNF for charset
>at all in which case it would be clear that it's just
>another parameter and therefore the value is token or
>quoted-string?  Or, at least, section 3.4 should say that
>this BNF is semantic and that quotes around token are used
>to delimit the parameter (see below).
>
>If you're trying to figure out what the spec says for
>charset values, and you turn to section 3.4 since it defines
>charsets, in it's current form, you get a very different
>notion of what's allowed then I think is intended.
>
>Howard
>
>
>MIME's view of things, as best as I can find, is RFC 2045 section 5.1:
>
>   Note that the value of a quoted string parameter does not
>include the
>   quotes.  That is, the quotation marks in a quoted-string are not a
>   part of the value of the parameter, but are merely used to delimit
>   that parameter value.  In addition, comments are allowed in
>   accordance with RFC 822 rules for structured header fields.
> Thus the
>   following two forms
>
>     Content-type: text/plain; charset=us-ascii (Plain text)
>
>     Content-type: text/plain; charset="us-ascii"
>
>   are completely equivalent.
>

HTTP has much of it's design from MIME, probably you can use the
quoted-string, and it's compliant withe spec.

However, I don't know if client implementation support it, but I expect they
will, through I'm not sure, nor have any possibility to test this. This is
actually the issue with things like this.

The only server I found: HEAD http://www.freebsd.com/ HTTP/1.1
returned the value of the parameter "charset" without quotes.
I would recommend to simply not use them, just in case...



- Joris

Received on Wednesday, 7 February 2001 13:10:11 UTC