ID: wildcards in Accept-Charset

I believe I once promised to write this draft: it is about one of the issues
on the HTTP/1.1 issues list.  

I just submitted it to the ID editor, but it is short enough to post it here
too.

Koen.

---snip---
HTTP Working Group                                     Koen Holtman, TUE
Internet-Draft
Expires: September 18, 1997                               March 18, 1997


                   Wildcards in the Accept-Charset Header

                    draft-holtman-http-wildcards-00.txt


STATUS OF THIS MEMO

        This document is an Internet-Draft. Internet-Drafts are
        working documents of the Internet Engineering Task Force
        (IETF), its areas, and its working groups. Note that other
        groups may also distribute working documents as
        Internet-Drafts.

        Internet-Drafts are draft documents valid for a maximum of
        six months and may be updated, replaced, or obsoleted by
        other documents at any time. It is inappropriate to use
        Internet-Drafts as reference material or to cite them other
        than as "work in progress".

        To learn the current status of any Internet-Draft, please
        check the "1id-abstracts.txt" listing contained in the
        Internet-Drafts Shadow Directories on ftp.is.co.za
        (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific
        Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US
        West Coast).

        Distribution of this document is unlimited.  Please send
        comments to the HTTP working group at
        <http-wg@cuckoo.hpl.hp.com>.  Discussions of the working
        group are archived at
        <URL:http://www.ics.uci.edu/pub/ietf/http/>.  General
        discussions about HTTP and the applications which use HTTP
        should take place on the <www-talk@w3.org> mailing list.

ABSTRACT

   The HTTP/1.1 specification (RFC 2068) defines an Accept-Charset
   header, but fails to define a wildcard "*" which could be used in
   this header to match all character sets.  This proposal corrects
   this omission.


1  Introduction

 The HTTP/1.1 specification (RFC 2068) defines an Accept-Charset
 header, but fails to define a wildcard "*" which could be used in
 this header to match all character sets.  This proposal corrects this
 omission.

 A wildcard in the Accept-Charset header is considered important,
 because it allows a better specification of the acceptance of many
 character sets if it is used in combination with q values.  The
 support for many different character sets is one possible route (or
 transition path) for web internationalization.  The existence of this
 path, and the desirability of enabling it, was not properly
 recognized when he HTTP/1.1 specification [1] was written.

 A wildcard can only be used to give an inaccurate specification of
 the support levels for many character sets under HTTP/1.x-based
 server-driven negotiation [1], and this inaccuracy may lead to
 problems.  When used in HTTP transparent content negotiation [2]
 however, the wildcard does not cause inaccurate end results, and in
 fact can be used as a bandwidth-saving device (see section 4.2.1 of
 [3]).


2 Proposed edits

 It is proposed to change the following text in section 14.2 of [1]:

   The ISO-
   8859-1 character set can be assumed to be acceptable to all user
   agents.

          Accept-Charset = "Accept-Charset" ":"
                    1#( charset [ ";" "q" "=" qvalue ] )

   Character set values are described in section 3.4. Each charset may
   be given an associated quality value which represents the user's
   preference for that charset. The default value is q=1. An example is

          Accept-Charset: iso-8859-5, unicode-1-1;q=0.8

   If no Accept-Charset header is present, the default is that any
   character set is acceptable.

 to the text below:

   The ISO-
   8859-1 character set can be assumed to be acceptable to all user
   agents.

          Accept-Charset = "Accept-Charset" ":"
 |                  1#( ( charset | "*" ) [ ";" "q" "=" qvalue ] )

   Character set values are described in section 3.4. Each charset may
   be given an associated quality value which represents the user's
   preference for that charset. The default value is q=1. An example is

          Accept-Charset: iso-8859-5, unicode-1-1;q=0.8

 | The special value "*", if present in the Accept-Charset field,
 | matches every character set (including ISO-8859-1) which is not
 | mentioned elsewhere in the Accept-Charset field.  If no "*" is
 | present in an Accept-Charset field, then all character sets not
 | explicitly mentioned get a quality value of 0, except for
 | ISO-8859-1, which gets a quality value of 1 if not explicitly
 | mentioned. 
   If no Accept-Charset header is present, the default is that any character
   set is acceptable.


3 Compatibility considerations

 The syntax rules in the current version of the HTTP/1.1 specification
 [1] allow a charset value of "*" to be present in the Accept-Charset
 header.  Thus, servers which implement [1] will have no trouble
 parsing a header like

      Accept-Charset: iso-8859-5;q=0.8, *;q=0.2

 According to [1], the "*" value should be interpreted as an unknown
 (unregistered) character set designator.  Thus, servers which
 implement [1] will simply ignore the wildcard if present.


4 Security considerations

 This proposal adds no new HTTP security considerations.


5 References

   [1] R. Fielding, J. Gettys, J. C. Mogul, H. Frystyk, and
       T. Berners-Lee.  Hypertext Transfer Protocol -- HTTP/1.1.  RFC
       2068, HTTP Working Group, January, 1997.

   [2] K. Holtman, A. Mutz.  Transparent Content Negotiation in HTTP.
       Internet-Draft draft-ietf-http-negotiation-01.txt, HTTP Working
       Group.

   [3] K. Holtman, A. Mutz.  HTTP Remote Variant Selection Algorithm
       -- RVSA/1.0.  Internet-Draft draft-ietf-http-rvsa-v10-00.txt,
       HTTP Working Group.


6 Author's address

   Koen Holtman
   Technische Universiteit Eindhoven
   Postbus 513
   Kamer HG 6.57
   5600 MB Eindhoven (The Netherlands)
   Email: koen@win.tue.nl


Expires: September 18, 1997

Received on Wednesday, 19 March 1997 11:32:10 UTC