W3C home > Mailing lists > Public > ietf-http-wg-old@w3.org > January to April 2001

Re: can charsets be quoted.

From: Melman, Howard <Howard@silverstream.com>
Date: Fri, 9 Feb 2001 15:41:51 -0500
To: "Roy T. Fielding" <fielding@ebuilt.com>
Cc: Larry Masinter <LMM@acm.org>, "Melman, Howard" <Howard@silverstream.com>, HTTP WG <http-wg@cuckoo.hpl.hp.com>, Joris Dobbelsteen <joris.dobbelsteen@mail.com>, Paul Leach <paulle@Exchange.Microsoft.com>
Message-ID: <14980.22031.111000.310785@gargle.gargle.HOWL>

On Friday Feb 9, 2001, Roy T. Fielding wrote:

> > Hopefully this will get on the "errata" page...
> Why?  The spec is correct.  It takes a great deal of imagination
> to believe that the use of the word token in the text should somehow imply
> that the HTTP syntax excludes a quoted-string.  Any token can appear inside
> a quoted string.

Perhaps, but it's not just the word "token" in text.  There
seems to be an ABNF rule in the section which as near as I
can tell adds no value to the description and does add
confusion.  Below is the text of section 3.4:


3.4 Character Sets

   HTTP uses the same definition of the term "character set" as that
   described for MIME:

   The term "character set" is used in this document to refer to a
   method used with one or more tables to convert a sequence of octets
   into a sequence of characters. Note that unconditional conversion in
   the other direction is not required, in that not all characters may
   be available in a given character set and a character set may provide
   more than one sequence of octets to represent a particular character.
   This definition is intended to allow various kinds of character
   encoding, from simple single-table mappings such as US-ASCII to
   complex table switching methods such as those that use ISO-2022's
   techniques. However, the definition associated with a MIME character
   set name MUST fully specify the mapping to be performed from octets
   to characters. In particular, use of external profiling information
   to determine the exact mapping is not permitted.

      Note: This use of the term "character set" is more commonly
      referred to as a "character encoding." However, since HTTP and
      MIME share the same registry, it is important that the terminology
      also be shared.

   HTTP character sets are identified by case-insensitive tokens. The
   complete set of tokens is defined by the IANA Character Set registry

       charset = token

   Although HTTP allows an arbitrary token to be used as a charset
   value, any token that has a predefined value within the IANA
   Character Set registry [19] MUST represent the character set defined
   by that registry. Applications SHOULD limit their use of character
   sets to those defined by the IANA registry.

   Implementors should be aware of IETF character set requirements [38]
Received on Friday, 9 February 2001 20:43:13 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:16:36 UTC