W3C home > Mailing lists > Public > ietf-http-wg-old@w3.org > January to April 2001

can charsets be quoted.

From: Melman, Howard <Howard@silverstream.com>
Date: Wed, 7 Feb 2001 11:46:05 -0500
To: HTTP Working Group <http-wg@cuckoo.hpl.hp.com>
Message-ID: <14977.31693.34000.813690@gargle.gargle.HOWL>

Is this legal:

    Content-Type: text/html; charset="iso-8859-1"

Specifically are the double quotes around the charset value
legal?  I assume the intent is that they are, but I believe
the spec as written doesn't allow for them.  I know others
(like WebDAV) assume you can use double quotes, and I know
it's legal in MIME (see below)

In RFC 2616 14.17 Content-Type refers you to 3.7 on media
types.  3.7 defines media-type as:

       media-type     = type "/" subtype *( ";" parameter )

and refers you to 3.6 to define parameter.  3.6 says:

   Parameters are in  the form of attribute/value pairs.

       parameter               = attribute "=" value
       attribute               = token
       value                   = token | quoted-string

so the values can be a token or a quoted-string, great, it
seems that charset values can be quoted.  BUT the last
paragraph of 3.7.1 says:

   The "charset" parameter is used with some media types to define the
   character set (section 3.4) of the data. When no explicit charset
   parameter is provided by the sender, media subtypes of the "text"
   type are defined to have a default charset value of "ISO-8859-1" when
   received via HTTP. Data in character sets other than "ISO-8859-1" or
   its subsets MUST be labeled with an appropriate charset value. See
   section 3.4.1 for compatibility problems.

Specifically referring us to section 3.4 for the definition
of the charset parameter.  3.4 defines charset as:

   HTTP character sets are identified by case-insensitive tokens. The
   complete set of tokens is defined by the IANA Character Set registry
   [19].

       charset = token

And "token" doesn't allow quotes.  Shouldn't this be:

       charset = token | quoted-string

or else, doesn't the spec disallow quotes around charset
values?  Or should section 3.4 not offer a BNF for charset
at all in which case it would be clear that it's just
another parameter and therefore the value is token or
quoted-string?  Or, at least, section 3.4 should say that
this BNF is semantic and that quotes around token are used
to delimit the parameter (see below).  

If you're trying to figure out what the spec says for
charset values, and you turn to section 3.4 since it defines
charsets, in it's current form, you get a very different
notion of what's allowed then I think is intended.

Howard


MIME's view of things, as best as I can find, is RFC 2045 section 5.1:

   Note that the value of a quoted string parameter does not include the
   quotes.  That is, the quotation marks in a quoted-string are not a
   part of the value of the parameter, but are merely used to delimit
   that parameter value.  In addition, comments are allowed in
   accordance with RFC 822 rules for structured header fields.  Thus the
   following two forms

     Content-type: text/plain; charset=us-ascii (Plain text)

     Content-type: text/plain; charset="us-ascii"

   are completely equivalent.
Received on Wednesday, 7 February 2001 16:48:39 EST

This archive was generated by hypermail pre-2.1.9 : Wednesday, 24 September 2003 06:33:41 EDT