- From: Melman, Howard <Howard@silverstream.com>
- Date: Wed, 7 Feb 2001 11:46:05 -0500
- To: HTTP Working Group <http-wg@cuckoo.hpl.hp.com>
Is this legal:
Content-Type: text/html; charset="iso-8859-1"
Specifically are the double quotes around the charset value
legal? I assume the intent is that they are, but I believe
the spec as written doesn't allow for them. I know others
(like WebDAV) assume you can use double quotes, and I know
it's legal in MIME (see below)
In RFC 2616 14.17 Content-Type refers you to 3.7 on media
types. 3.7 defines media-type as:
media-type = type "/" subtype *( ";" parameter )
and refers you to 3.6 to define parameter. 3.6 says:
Parameters are in the form of attribute/value pairs.
parameter = attribute "=" value
attribute = token
value = token | quoted-string
so the values can be a token or a quoted-string, great, it
seems that charset values can be quoted. BUT the last
paragraph of 3.7.1 says:
The "charset" parameter is used with some media types to define the
character set (section 3.4) of the data. When no explicit charset
parameter is provided by the sender, media subtypes of the "text"
type are defined to have a default charset value of "ISO-8859-1" when
received via HTTP. Data in character sets other than "ISO-8859-1" or
its subsets MUST be labeled with an appropriate charset value. See
section 3.4.1 for compatibility problems.
Specifically referring us to section 3.4 for the definition
of the charset parameter. 3.4 defines charset as:
HTTP character sets are identified by case-insensitive tokens. The
complete set of tokens is defined by the IANA Character Set registry
[19].
charset = token
And "token" doesn't allow quotes. Shouldn't this be:
charset = token | quoted-string
or else, doesn't the spec disallow quotes around charset
values? Or should section 3.4 not offer a BNF for charset
at all in which case it would be clear that it's just
another parameter and therefore the value is token or
quoted-string? Or, at least, section 3.4 should say that
this BNF is semantic and that quotes around token are used
to delimit the parameter (see below).
If you're trying to figure out what the spec says for
charset values, and you turn to section 3.4 since it defines
charsets, in it's current form, you get a very different
notion of what's allowed then I think is intended.
Howard
MIME's view of things, as best as I can find, is RFC 2045 section 5.1:
Note that the value of a quoted string parameter does not include the
quotes. That is, the quotation marks in a quoted-string are not a
part of the value of the parameter, but are merely used to delimit
that parameter value. In addition, comments are allowed in
accordance with RFC 822 rules for structured header fields. Thus the
following two forms
Content-type: text/plain; charset=us-ascii (Plain text)
Content-type: text/plain; charset="us-ascii"
are completely equivalent.
Received on Wednesday, 7 February 2001 08:52:12 UTC