Definition of 'charset' in RFC2616 from Martin Duerst on 2005-02-16 (ietf-http-wg@w3.org from January to March 2005)

From: Martin Duerst <duerst@w3.org>
Date: Wed, 16 Feb 2005 14:46:00 +0900
To: ietf-http-wg@w3.org
Message-Id: <6.0.0.20.2.20050216143619.08c060c0@localhost>

Dear HTTP experts,

RFC 2616 currently says, in 3.4, Character Sets:

    HTTP character sets are identified by case-insensitive tokens. The
    complete set of tokens is defined by the IANA Character Set registry
    [19].

        charset = token

    Although HTTP allows an arbitrary token to be used as a charset
    value, any token that has a predefined value within the IANA
    Character Set registry [19] MUST represent the character set defined
    by that registry. Applications SHOULD limit their use of character
    sets to those defined by the IANA registry.

The references then give

   [19] Reynolds, J. and J. Postel, "Assigned Numbers", STD 2, RFC 1700,
         October 1994.

This is a very old snapshot of the IANA charset registry, missing
a few important entries (such as UTF-8).

Based on this, we have seen claims saying that utf-8 cannot be used
in HTTP. While I would personally consider such claims somewhere
between 'bogus' and 'doubtful', it would be great if the HTTP spec
were changed to directly point to the IANA registry if and when
updated in the future.

Regards,     Martin.

P.S.: As a separate, but related issue, it might also be a good
       idea to remove the never actually effective default of
       iso-8859-1.

Received on Wednesday, 16 February 2005 05:59:05 UTC