New content negotiation sections, v4 (long) from Koen Holtman on 1996-02-25 (ietf-http-wg@w3.org from January to March 1996)

From: Koen Holtman <koen@win.tue.nl>
Date: Sun, 25 Feb 1996 23:41:30 +0100 (MET)
To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Cc: Koen Holtman <koen@win.tue.nl>
Message-Id: <199602252241.XAA17702@wsooti04.win.tue.nl>
     New content negotiation sections
     ================================

                                         Koen Holtman, koen@win.tue.nl
	                                 version 2, 31 Jan 1996
                                         version 3, 22 Feb 1996
                                         version 4, 25 Feb 1999


0  Purpose of this document

   This document proposes a content negotiation mechanism for HTTP/1.1.
   It contains a number of sections that should be read as definitions
   in the context of the current draft HTTP/1.1 specification [1].  It
   is intended that these sections are merged into a future version of
   the draft HTTP/1.1 specification.

   This document reflects the consensus of the content negotiation
   subgroup, as I perceive it now. (But note that the content
   negotiation subgroup also has consensus on some things not covered in
   this document.)  It also contains some elements the content
   negotiation subgroup has not discussed, or for which only `consensus
   by the absence of replies' was reached.  Issues that still need to be
   resolved are marked as such.

   I am posting this document to the entire workgroup so that we can
   start converging on a version that reflects the consensus of the
   entire workgroup.  Please send comments to the http-wg mailing list.

   Version 3 of this document was submitted as an internet draft with
   the name draft-holtman-http-content-negotiation-00.txt.  Changes with
   respect to version 3 are listed below.  Earlier versions of this
   document can be found in the content negotiation subgroup mail
   archives: <URL:http://www.organic.com/public/conneg/mail/>.


1  Introduction

   Content negotiation, as proposed in this document, is an optional
   feature for the HTTP/1.1 protocol: resources may be negotiable, but
   they need not be.  If a resource is negotiable, this changes the
   semantics of GET and HEAD transactions on the resource.  Other
   transactions are not affected.

   A negotiable resource has a number of alternates bound to it.  The
   proposed content negotiation mechanism allows for automatic selection
   of the preferred alternate bound to a negotiable resource based on
   the properties of the alternates and on the user agent preferences
   for the retrieval action.

   This document builds on the content negotiation descriptions in [1],
   and directly incorporates text from [1] in some places.  A new
   directive, reactive-on-wildcard, is introduced to allow user agents
   to signal the capability of doing content negotiation.  If this
   directive is absent, the proposed definitions produce server
   behavior that yields adequate results for (HTTP/1.0) user agents
   that do not support content negotiation.


2  Terminology and notation

   This document uses the terminology and notational conventions defined
   in [1].  It sometimes refers directly to sections in [1], using the
   notation `Section (1.2[1])'.  If a (sub)section title below is marked
   with (*), is intended as a replacement for the (sub)section with the
   same title in [1].  All other (sub)sections below, up to Section 7,
   contain new material intended as an addition to [1].

   The text blocks marked with ## signs are comments.  Some of them will
   be removed in later versions of this document, others may be kept
   until the last version, but should be removed when text is taken from
   this document and put into a HTTP 1.1 draft.

   Some of the new response header and field names defined here are very
   long, I would be happy with alternative names that are shorter, and
   expect that the eventual 1.1 draft will indeed shorten some names
   defined here.


[##
Changes from version 2 to version 3 (internet draft)
- Took out the [## ... ##] comments
- Added internet draft headers
- Re-numbered sections, added introduction
- Strengthened the restrictions that partially prevent spoofing using
  Location headers.
- Added rewrites of the Accept-* header sections bases on consensus in
  the content negotiation sub-wg.
- Added q, ql, .. factor computations rules in the preemptive
  negotiation section.
- Changed the word `variant' to `alternate' to eliminate a terminology
  clash with the Vary header.
- Changed the word `virtual' to `derived'. 
- Split the 1.1-00 URI header into a smaller URI header with only
  "mirror" and "name", and a new Alternates header.
- Changed Variant-If-Modified-Since mechanism into the more general
  Rep-Header mechanism, to account for If-validator-valid,
  Variant-Set, Cache-Control, and whatever we come up with next.
- Changed the rules on when to include an entity in a 300 or 406
  response, simplified the rules on when such responses may be
  generated.
- Rewrote the caching section
- Rewrote the security section and added text about privacy issues
- Split the 1.1-00 406 (None acceptable) response code into a new 406
  (None acceptable) response code and a new 408 (Not acceptable)
  response code.
- Several minor edits
##]

[##
Changes from version 3 (internet draft) to version 4:
- Put back most of the comments from version 2
- Took out internet draft headers
- Added more comments
- Added rule that proxies may not negotiate based on Alternates
  headers with attributes they do not understand.
- Several minor edits
##]

[## Question to be resolved: Should a rudimentary feature negotiation
facilities that work for 90% of the cases be added as a stopgap??  I
wonder if we won't be doing the web community a disservice if we delay
a 90% solution in order to construct a 99% solution for HTTP 1.2.
After all, most negotiation that happens now is on tables vs. no
tables, not on language or MIME type.
##]

3  Status code definitions

3.1  Redirection 3xx

300 Multiple Choices (*)

   The requested resource is a negotiable resource and the server is
   engaging in reactive content negotiation (Section 5).  The server has
   determined that multiple alternates are acceptable, but is not able
   to determine which alternate is the best alternate.  This response
   may only be generated if specific conditions given in Section 5.2 are
   met.  The response must include an Alternates header describing the
   alternates bound to the resource, allowing a user agent to
   automatically select and retrieve an alternate if appropriate.

   This response is cachable, subject to the restrictions specified in
   the cache-control directive, if present, of the included Alternates
   header.

   If no Accept header in the request contains a reactive-on-wildcard
   directive, and it was not a HEAD request, the response must include
   an entity that gives the user the option to select the most
   appropriate alternate manually.  The suggested entity media type as
   given in the Content-Type response header is "text/html".  If there
   is a reactive-on-wildcard directive, no entity should be included.

[## Note: This `no entity should be included' rule is for saving
bandwidth.  It is expected that clients that add reactive-on-wildcard
directives are always able to give the user the option to select the
most appropriate alternate manually, using only the Alternates
header.##]

   If the service author finds it appropriate for any user agent that
   does not implement an alternate selection algorithm to automatically
   retrieve a certain alternate, then a Location response header giving
   the URI of that alternate may be included in the response.


9.4  Client Error 4xx

406 None Acceptable (*)

   The requested resource is a negotiable resource and the server is
   engaging in reactive content negotiation (Section 5).  Usually, this
   response indicates that the server was not able to positively
   determine that at least one of the available alternates would be
   acceptable.  The response must include an Alternates header
   describing the alternates bound to the resource, allowing a user
   agent to automatically select and retrieve an alternate if
   appropriate.

   This response is cachable, subject to the restrictions specified in
   the cache-control directive, if present, of the included Alternates
   header.

   If no Accept header in the request contains a reactive-on-wildcard
   directive, and it was not a HEAD request, the response must include
   an entity that gives the user the option to select the most
   appropriate alternate manually.  The suggested entity media type as
   given in the Content-Type response header is "text/html".  If there
   is a reactive-on-wildcard directive, no entity should be included.

   If the service author finds it appropriate for any user agent that
   does not implement an alternate selection algorithm to automatically
   retrieve a certain alternate, then a Location response header giving
   the URI of that alternate may be included in the response.


408 Not Acceptable (*)

   The resource identified by the Request-URI has content
   characteristics that are not acceptable according to the accept
   headers sent in the request.  This response code must only be
   generated by un-negotiable resources.


3  Protocol parameter descriptions

3.1  Language Tags (*)

[##Note: I deleted the language tag matching discussion that used to be
in this Section to Section 10.4 (Accept-Language) No other edits were
made.##]

   A language tag identifies a natural language spoken, written, or
   otherwise conveyed by human beings for communication of information
   to other human beings. Computer languages are explicitly excluded.
   HTTP uses language tags within the Accept-Language, Content-Language,
   and Alternates fields.

   The syntax and registry of HTTP language tags is the same as that
   defined by RFC 1766 [2]. In summary, a language tag is composed of 1
   or more parts: A primary language tag and a possibly empty series of
   subtags:

        language-tag  = primary-tag *( "-" subtag )

        primary-tag   = 1*8ALPHA
        subtag        = 1*8ALPHA

   Whitespace is not allowed within the tag and all tags are
   case-insensitive. The namespace of language tags is administered by
   the IANA. Example tags include:

       en, en-US, en-cockney, i-cherokee, x-pig-latin

   where any two-letter primary-tag is an ISO 639 language abbreviation
   and any two-letter initial subtag is an ISO 3166 country code.


4  Header field definitions

4.1  Accept (*)

[## Note: I did a rewrite of this section, which also involved
deleting some remarks about things that are better said in Section
5.##]

   The Accept request-header field can be used to specify certain media
   types which are acceptable for the response.  Accept headers can be
   used to guide content negotiation (Section 5), and can also be used
   to indicate that the request is specifically limited to a small set
   of desired types, as in the case of a request for an in-line image.
   In general, it is not efficient to send long Accept headers in every
   request.  See Section 5.2 for a discussion of Accept header
   efficiency considerations.

   The field may be folded onto several lines and more than one
   occurrence of the field is allowed, with the semantics being the same
   as if all the entries had been in one field value.

       Accept         = "Accept" ":" #(
                        ( media-range
                          [ ";" "q" "=" qvalue ]
                          [ ";" "mxb" "=" 1*DIGIT ] )
                        | reactive-on-wildcard )

       media-range    = ( "*/*"
                      |   ( type "/" "*" )
                      |   ( type "/" subtype )
                        ) *( ";" parameter )

       reactive-on-wildcard = "reactive-on-wildcard" | "r-o-w"

   The asterisk "*" character is used to group media types into ranges,
   with "*/*" indicating all media types and "type/*" indicating all
   subtypes of that type.

   The parameter q is used to indicate the media type quality factor,
   which represents the user's preference for that range of media
   types. The parameter mxb gives the maximum acceptable size of the
   Entity-Body, in decimal number of octets, for that range of media
   types.  The default values are: q=1 and mxb=undefined (i.e.,
   infinity).  Section 5 describes the content negotiation algorithm
   which makes use of these values.

   The example

       Accept: audio/*; q=0.2, audio/basic

   should be interpreted as "I prefer audio/basic, but send me any audio
   type if it is the best available after an 80% mark-down in quality."

   If no Accept header is present, then it is assumed that the client
   accepts all media types.  If Accept headers are present, and if the
   resource is an un-negotiable resource which cannot generate a
   response which is acceptable according to the Accept headers, then
   the server should generate an error response with the 408 (not
   acceptable) status code.

   A more elaborate example is

       Accept: text/plain; q=0.5, text/html,
               text/x-dvi; q=0.8; mxb=100000, text/x-c

   Verbally, this would be interpreted as "text/html and text/x-c are
   the preferred media types, but if they do not exist, then send the
   text/x-dvi entity if it is less than 100000 bytes, otherwise send the
   text/plain entity."

   Media ranges can be overridden by more specific media ranges or
   specific media types. If more than one media range applies to a given
   type, the most specific reference has precedence. For example,

       Accept: text/*, text/html, text/html;version=2.0, */*

   have the following precedence:

       1) text/html;version=2.0
       2) text/html
       3) text/*
       4) */*

   The media type quality factor and maximum acceptable size associated
   with a given type are determined by finding the media range with the
   highest precedence which matches that type.

   For example,

       Accept: text/*;q=0.3, text/html;q=0.7, text/html;version=2.0,
               */*;q=0.5

   would cause the following type quality factors to be associated:

       text/html;version=2.0                      = 1
       text/html                                  = 0.7
       text/plain                                 = 0.3
       image/jpeg                                 = 0.5
       text/html;level=3                          = 0.7

   The inclusion of a reactive-on-wildcard directive in an Accept header
   will change the rules for the sending of reactive negotiation
   responses (Section 5). The example

       Accept: text/html; */*;q=0.95, r-o-w

   should be interpreted as "text/html is my preferred media type, and I
   assign media type quality factors in the range 0 - 0.95 to all other
   media types.  Send me a reactive negotiation response, so that I can
   pick the best alternate myself, if you have any non-text/html
   alternate which might give me a higher overall quality than any
   text/html alternate."

       Note: A user agent may be provided with a default set of 
       quality values for certain media ranges. However, unless the 
       user agent is a closed system which cannot interact with 
       other rendering agents, this default set should be 
       configurable by the user.


4.2  Accept-Charset (*)

   The Accept-Charset request-header field can be used to indicate what
   character sets are acceptable for the response. This field allows
   clients capable of understanding more comprehensive or
   special-purpose character sets to signal that capability to a server
   which is capable of representing documents in those character
   sets. The US-ASCII character set can be assumed to be acceptable to
   all user agents.

       Accept-Charset = "Accept-Charset" ":" 1#charset

   Character set values are described in Section (3.4[1]). An example is

       Accept-Charset: iso-8859-1, unicode-1-1

   If no Accept-Charset header is present, the default is that any
   character set is acceptable.  If an Accept-Charset header is present,
   and if the resource is an un-negotiable resource which cannot
   generate a response which is acceptable according to the
   Accept-Charset header, then the server should generate an error
   response with the 408 (not acceptable) status code.


4.3  Accept-Encoding (*)

   The Accept-Encoding request-header field is similar to Accept, but
   restricts the content-coding values (Section (3.5[1])) which are
   acceptable in the response.

       Accept-Encoding         = "Accept-Encoding" ":" 
                                 #( content-coding )

   An example of its use is

       Accept-Encoding: compress, gzip

   If no Accept-Encoding header is present in a request, the server may
   assume that the client will accept any content coding.  If an
   Accept-Encoding header is present, and if the resource is an
   un-negotiable resource which cannot generate a response which is
   acceptable according to the Accept-Encoding header, then the server
   should generate an error response with the 408 (not acceptable)
   status code.


4.4  Accept-Language (*)

   The Accept-Language request-header field is similar to Accept, but
   restricts the set of natural languages that are preferred as a
   response to the request.

       Accept-Language = "Accept-Language" ":"
                         1#( language-range [ ";" "q" "=" qvalue ] )

       language-range = ( ( 1*8ALPHA *( "-" 1*8ALPHA ) )
                        | "*" )

   Each language-range may be given an associated quality value which
   represents an estimate of the user's comprehension of the languages
   specified by that range.  The quality value defaults to "q=1" (100%
   comprehension).  This value may be used in the server's content
   negotiation algorithm (Section 5). For example,

       Accept-Language: da, en-gb;q=0.8, en;q=0.7

   would mean: "I prefer Danish, but will accept British English (with
   80% comprehension) and other types of English (with 70%
   comprehension)."

   A language-range matches a language-tag if it exactly equals the tag,
   or if it is a prefix of the tag such that the first tag character
   following the prefix is "-".  The special range "*", if present in
   the Accept-Language field, matches every tag not matched by any other
   ranges present in the Accept-Language field.

       Note: This use of a prefix matching rule does not imply that
       language tags are assigned to languages in such a way that it is
       always true that if a user understands a language with a certain
       tag, then this user will also understand all languages with tags
       for which this tag is a prefix.  The prefix rule simply allows
       the use of prefix tags if this is the case.

   The language quality factor assigned to a language-tag by the
   Accept-Language field is the quality value of the longest
   language-range in the field that matches the language-tag.  If no
   language-range in the field matches the tag, the language quality
   factor assigned is 0.

   If no Accept-Language header is present in a request, the server
   should assume that all languages are equally acceptable.  If an
   Accept-Language header is present, then all languages which are
   assigned a quality factor greater than 0 are acceptable.  If the
   resource is an un-negotiable resource which cannot generate a
   response for an audience capable of understanding at least one
   acceptable language, it is acceptable to serve a response that uses
   other languages.

   It may be contrary to be privacy expectations of the user to send an
   Accept-Language header with the complete linguistic preferences of
   the user in every request.  For a complete discussion of this issue,
   see Section 6.3.  If a reactive-on-wildcard directive is present in
   an Accept header, the user agent can safely omit certain languages
   intelligible to the user from the Accept-Language header, without
   affecting the quality of the negotiation process in requests on
   negotiated resources, if the language-range "*" is included with an
   appropriate language quality factor,

       Note: As intelligibility is highly dependent on the 
       individual user, it is recommended that client applications 
       make the choice of linguistic preference available to the 
       user. If the choice is not made available, then the 
       Accept-Language header field must not be given in the 
       request.  

[#### Issue to be resolved: the 1.1-00 spec has a sentence in this
section that says:

   "If the server cannot fulfill the request with one or more of the 
   languages given, or if the languages only represent a subset of a 
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   multi-linguistic Entity-Body, [....]"
   ^^^^^^^^^^^^^^^^

According to this sentence, an entity body can use multiple languages,
all of which need to be understood by the sender of the
Accept-Language header, so the document would in fact be for a
multi-linguistic audience.  But in Section 10.11 (Content-Language)
the 1.1-00 spec states:

   Multiple languages may be listed for content that is intended for 
   multiple audiences. For example, a rendition of the "Treaty of
   ^^^^^^^^^^^^^^^^^^ 
   Waitangi," presented simultaneously in the original Maori and 
   English versions, would call for

       Content-Language: mi, en

   However, just because multiple languages are present within an 
   entity does not mean that it is intended for multiple linguistic 
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   audiences. An example would be a beginner's language primer, such 
   ^^^^^^^^^
   as "A First Lesson in Latin," which is clearly intended to be used 
   by an English-literate audience. In this case, the Content-Language 
   should only include "en".

There seems to be an internal contradiction here, the text above
states that content can never be designated as being for a
multi-linguistic audience, it can only be designated as being for
multiple linguistic-audiences.  

So should HTTP use "multi-linguistic audiences" or "multiple
linguistic-audiences"?  In this Accept-Language section, I use
"multiple linguistic-audiences".
####]


4.5  URI (*)

   The URI entity-header field is used to inform the recipient of other
   Uniform Resource Identifiers (Section (3.2[1])) by which the resource
   can be identified.

       URI-header  = "URI" ":" 1#( uri-mirror | uri-name )

       uri-mirror  = "{" "mirror" <"> URI <"> "}"
       uri-name    = "{" "name" <"> URI <"> "}"

   Any URI specified in this field can be absolute or relative to the
   Request-URI. The "mirror" form of URI refers to a location which is a
   mirror copy of the Request-URI. The "name" form refers to a
   location-independent name corresponding to the Request-URI.


[## Side issue: I find that the "mirror" and "name" descriptions above
do not give enough information to let me know what they are supposed
to mean.  I understand that the semantics come from current practice
in the CERN server.  Anyone care to expand these descriptions?##]


4.6  Alternates

   The Alternates entity-header field is used to describe the alternate
   resources bound to a negotiable resource.

       Alternates = "Alternates" ":" 1#( alternate-descr  
                                       | caching-directive )

       alternate-descr = 
               "{" <"> URI <"> source-quality
                   [ "{" "type" <"> media-type <"> "}" ]
                   [ "{" "language" <"> 1#language-tag <"> "}" ]
                   [ "{" "encoding" <"> 1#content-coding <"> "}" ]
                   [ "{" "length" 1*DIGIT "}" ]
                   [ "{" "description" quoted-string "}" ]
                   [ extension-attribute ]
               "}"

       source-quality = qvalue

       extension-attribute = "{" extension-name extension-value "}"
       extension-name      = token
       extension-value     = #( token | quoted-string 
                              | <any element of tspecials except "}"> )

          Note: the extension-attribute is included because it is
          expected that HTTP/1.2 will define new attributes for use in
          the Alternates header.  Also, this attribute eases content
          negotiation experiments under HTTP/1.1.

       caching-directive = "{" "cache-control" 1#cache-directive "}"

   Cache-directives are defined in Section (10.8[1]).

[##Issue to be resolved: Would just having the max-age cache-directive
here be sufficient?##] 

[##Note: If Age: goes into HTTP/1.1 for caching of normal responses,
we need to add optional age field to the URI header##]

   Any URI specified in this field can be absolute or relative to the
   Request-URI.  For each of the alternates bound to the negotiable
   resource, the alternates header must include an alternate-descr
   form describing that alternate.

[##Note: If the resource author cannot or does not want to list all the
alternates, Vary header based negotiation can be used##]

[## Question to be resolved: should text below up to the example be
moved to Section (3.9[1]) (Quality Values)?##]

   The source-quality attribute given in an alternate description is
   measured by the content provider as representing the amount of
   degradation from the original source.  For example, a picture
   originally in JPEG form would have a lower source quality when
   translated to the XBM format, and much lower source quality when
   translated to an ASCII-art alternate.  Note, however, that this is a
   function of the source -- an original piece of ASCII-art may degrade
   in quality if it is captured in JPEG form.

   Content providers should use the following table as a guide when
   assigning source quality values:

       1.000       no degradation 
       0.999-0.900 no noticeable degradation
       0.899-0.700 noticeable, but acceptable degradation
       0.699-0.500 barely acceptable degradation
       0.499-0.000 unacceptable degradation

[##Question to be resolved: can we come up with a word other than
`degradation' that also covers the case of alternates not converted
from one source?##]

   It is important that content providers do not assign very low source
   quality values without good reason, as this will limit the ability of
   users to influence the negotiation process with their own preference
   settings.

   If alternates are not converted from one source, but constructed
   separately to represent the same abstract information in different
   ways, then the source quality attributes can be used to express
   differences in quality between the alternates.

   An example Alternates header for a negotiable resource with the URI
   http://www.w3.org/pub/WWW/TheProject is:

       Alternates: {"TheProject.fr.html" 1.0
                          {type "text/html"} {language "fr"}},
                   {"TheProject.en.html" 1.0
                          {type "text/html"} {language "en"}},
                   {"TheProject.fr.txt" 0.7
                          {type "text/plain"} {language "fr"}},
                   {"TheProject.en.txt" 0.8
                          {type "text/plain"} {language "en"}}

   which indicates that the negotiable resource binds to four alternate
   resources that differ in media type and natural language.

   The type, language, encoding, and length attributes of an alternate
   description refer to their Content-* header counterparts.  Though all
   attributes are optional, it is often desirable to include as many
   attributes as possible as this will increase the quality of the
   negotiation process.  Servers must only generate extension-attributes
   whose names start with "x-".  Clients should ignore all extension
   attributes they do not recognize.  Proxies should not engage in
   alternate selection calculations on behalf of the origin server if an
   unrecognized attribute is present in the Alternates header.

   The description attribute is meant to provide a textual description
   of some properties of the alternate, to be displayed by a user agent
   when showing the list of all alternates bound to a negotiable
   resource (see Section 5).  This attribute can be included if the URI
   and normal attributes of an alternate are considered too opaque to
   allow interpretation by the user.

   The cache-control directive of the Alternates header field can be
   used to restrict the cachability of the Alternates header, and, for
   300 (multiple choices) and 406 (none acceptable) responses, the other
   parts of the response.  This directives duplicates the control
   functionality offered for un-negotiated resources by the
   Cache-Control header.

[## Issue to be resolved: Should there be a

    {"user-agent-prefix" quoted-string}

attribute which could be used for user agent negotiation?  The
matching rule could amount to: if you match a user-agent-prefix in an
alternate, exclude all other alternates with user-agent prefix
attributes that provide no, or shorter, matches from consideration.
Example:

    Alternates: {"plan.html" 0.9
                   {type "text/html"} {user-agent-prefix ""}},
                {"plan.wuxta.html" 0.6 {type "text/html"}
                   {user-agent-prefix "WuxtaWeb1."}
                   {description "Does not trigger bug in WuxtaWeb 1.x"}},
                {"plan.dvi" 1.0 {type "text/x-dvi"}},
                {cache-control max-age=1209600}
#]

[## Note: adding feature negotiation would add a "feature" attribute
in the alternates-descr syntax, and a corresponding Accept-feature 
request header.  The attribute would contain feature identifiers, which 
are short codes for things like `user agent supports HTML 3.0 tables', 
`user agent supports java', and maybe the negations of feature 
identifiers. ##]


4.7  Alt-Header

   The Alt-Header request-header can be used in requests to negotiable
   resources to introduce new request headers in any derived requests on
   alternate resources (see section 5.2).

     Alt-Header =  "Alt-Header" ":" <"> URI <"> Request-Header

   The URI specified in this field can be absolute or relative to the
   Request-URI.  A typical example is

     Alt-Header: "TheProject.en.html" If-Validator-Valid: 6a7bf

   If it already has a copy of the "TheProject.en.html" alternate in
   cache, a caching client can include this header in requests to allow
   the server to shorten a 200 (OK) preemptive negotiation response to a
   304 (not Modified) response in case that preemptive negotiation
   yields "TheProject.en.html" as the best alternate.

   Servers are always allowed to ignore Alt-Header request headers.

[##Note: Roy Fielding has proposed a Content-ID response header which
would carry validators guaranteed to be 
  1) different for different resources and 
  2) different for different resource versions.  
If we have such a header, and it is generally used, then we can simplify
Rep-Header to

 Unless-ID = "Unless-ID" ":" 1#cid   

with the meaning: send me a normal response unless the Content-ID would
be one of the listed Content-IDs.  If the Content-ID is one of the
listed ones, return a 4xx (Unless true) response instead.  The same
Unless-ID would also serve as a simplification of the
"If-Validator-Valid" and "Variant-Set" proposed in the caching subgroup.
##]


5  Content negotiation (*)

   Content negotiation is an optional feature of the HTTP/1.1 protocol:
   resources may be negotiable, but they need not be.  If a resource is
   negotiable, this changes the semantics of GET and HEAD transactions
   on the resource.  Other transactions are not affected.

   A negotiable resource has a number of alternates bound to it.  The
   HTTP content negotiation mechanism allows for automatic selection of
   the preferred alternate bound to a negotiable resource based on the
   properties of the alternates and on the user agent preferences for
   the retrieval action on the negotiated resource.

[## Note: `retrieval action' is a new term I had to introduce because
`request' is not entirely accurate here: with reactive negotiation,
one retrieval action causes two requests. ##]

   An alternate is a resource, identified by an alternate URI, that
   provides one possible representation of the `contents' of the
   negotiable resource.  An alternate resource must never be a
   negotiable resource itself.  It is the responsibility of the author
   of the negotiable resource, not the author of the alternate, to
   ensure that this restriction is not violated.

   The negotiability of a resource is expressed by the Alternates
   response header.  If a 2xx or 3xx class response does not include an
   Alternates response header, then the resource is un-negotiable.  If
   any response does include an Alternates response header, then the
   resource is negotiable.

   When displaying an alternate as the end result of a retrieval action
   on a negotiable resource, a user agent should allow the user to
   review a list of all alternates bound to the negotiable resource, and
   to initiate retrieval of another alternate if desired.  The list can
   be annotated with some or all of the properties of the alternates, as
   given by the Alternates header in the negotiable resource response.

   When displaying an alternate as the end result of a retrieval action
   on a negotiable resource, a user agent should show the negotiable
   resource URI, not the alternate resource URI, as being the URI the
   contents of which were retrieved.  If the user agent stores a
   reference to the content displayed for future use, it is the the
   negotiable resource URI, not the alternate resource URI, which should
   be stored.

   HTTP/1.1 provides for two types of content negotiation: preemptive
   and reactive.  Preemptive negotiation is generally faster than
   reactive negotiation, but it can only be used if sufficient
   information about user agent capabilities and user preferences is
   present in the request on the negotiable resource.  Reactive
   negotiation can always be used.  Therefore, preemptive negotiation is
   best seen as mechanism that can sometimes optimize on reactive
   negotiation transactions.


5.1  Reactive negotiation

   In reactive negotiation, the selection and retrieval of an alternate
   bound to the negotiable resource spans two transactions.  In the
   first transaction, the client transmits a request on the negotiable
   resource URI, and the server responds with a 300 (multiple choices)
   or 406 (none acceptable) response, which includes an alternates
   header describing the alternates bound to the negotiable resource.  A
   406 response may always be generated, a 300 response may only be
   generated if specific conditions given in Section 5.2 are met.  The
   client can use the Alternates header in the 300 or 406 response to
   select the alternate that matches best to the preferences for the
   retrieval action.

   In the second transaction, the user agent transmits a request on the
   URI of the selected alternate resource, and the server will typically
   respond with a 200 (OK) response, though other response codes like
   302 (moved temporarily) are also possible.  Only the user agent needs
   to know that the second request is part of a reactive negotiation
   process, all other parties can treat it as a normal request on an
   un-negotiated resource.

   User agents should use the reactive alternate selection algorithm
   below when automatically selecting the best alternate listed in an
   alternates header.  User agents are allowed to use other selection
   algorithms, but this is not recommended, as preemptive negotiation is
   defined to optimize the case in which the reactive alternate
   selection algorithm below is used.

   User agents that do not wish to implement an alternate selection
   algorithm can, by only using Accept request headers of a certain
   form, force servers to always include an entity when a reactive
   negotiation response is sent.  They can then use this entity to allow
   the user to select an alternate manually, or use the reactive
   response Location header, if present, to automatically fetch the
   alternate recommended by the server.

[##Note: the possibility of doing the above is also important for
proxies that want to mediate between a 1.0 client and a 1.1 server.
1.0 clients will always use Accept headers of the certain form that
triggers a response suitable for a client which does not implement
negotiation.##]

   In the first step of the reactive alternate selection algorithm, the
   overall quality for every alternate listed in the Alternates header
   of the negotiable resource is computed.  The overall quality of an
   alternate is a real number Q in the range 0 through 1, where 0 is the
   minimum and 1 the maximum value, defined as

      Q = qs * qe * qc * ql * q * qml

   The values qs,qe,qc,ql,q,qml for a particular alternate are all
   determined using the part of the received Alternates header
   describing that alternate, called the alternate description below.

      qs   The source quality factor for the alternate is given by the
           source-quality attribute in the alternate description.

      qe   The encoding quality factor is 1 if there is no encoding
           attribute in the alternate description.  If there is an
           encoding attribute in the alternate description, the encoding
           quality factor is 1 if the user agent can decode the given
           content encoding, 0 otherwise.

[##Question to be resolved: do we really want to distinguish between
alternates that have an encoding and alternates that do not?  This could
block a smooth transition to a scheme in which servers apply compression
on the fly if the client indicates it can handle decompression.  Maybe
negotiation about en/decoding capabilities should be kept separate from
the main content negotiation mechanism.  On the other hand, the
Transfer-Encoding header already seems to allow for a future
introduction of on the fly compression##]

      qc   The charset quality factor is 1 if there is no type attribute
           in the alternate description, or if the media type given in
           the type attribute of the alternate description does not have
           a charset parameter.  If there is a charset parameter, then
           the charset quality factor is 1 if the user agent can process
           a message with the given character set, 0 otherwise.  User
           agents must always be able to process a message with the
           US-ASCII charset.

[## Question to be resolved: do recent discussions on the http-wg list
indicate that the US-ASCII above should be changed into ISO-8859-1?  Or
should the text above be changed to say `US-ASCII or ISO-8859-1'?  I
believe the consensus was 'no'.##]

      ql   The language quality factor is 1 if there is no language
           attribute in the alternate description.  If there is a
           language attribute, then the language quality factor is the
           highest quality factor assigned to any one of the listed
           languages according to the user agent language preferences
           for the retrieval action.

[## Note: the 1.1-01 draft says: `If at least one alternate has an
assigned content language, but the one currently under consideration
does not, then it should be assigned the value "ql=0.5".'  I deleted
this requirement, service authors can more accurately use the qs
attribute to adjust things in situations where only some of the
alternates have languages##]

      q    The media type quality factor is 1 if there is no type          
           attribute in the alternate description.  If there is a type
           attribute, then the media type quality factor is the quality
           factor assigned to the given media type in in the user agent
           media type preferences for the retrieval action.

      qml  The maximum length quality factor is 1 if there is no length
           attribute in the alternate description.  If there is a length
           attribute in the alternate description, then the maximum
           length quality factor is 1 if the length given is less than
           or equal to the maximum acceptable length according to the
           user agent maximum length preferences for the retrieval
           action, 0 otherwise.  Preferred maximum lengths are often
           equal to `infinity'.

   In the second step of the the reactive alternate selection algorithm,
   the overall qualities of all alternates are compared to select the
   best alternate.  If there is one alternate with the highest overall
   quality value, then that alternate is the best alternate.  If there
   are multiple alternates that share the highest overall quality value,
   then the alternate that is listed first in the received Alternates
   header is the best alternate.

   If all alternates have an overall quality value of zero, a user agent
   should not automatically retrieve the first alternate, but stop the
   reactive negotiation process, allowing the user to decide on the next
   action.


5.2  Preemptive negotiation (*)

   In preemptive negotiation, the selection and retrieval of an
   alternate bound to the negotiable resource is done in a single
   transaction, saving one round trip time over reactive negotiation.  A
   preemptive negotiation response must only be generated by a server if
   the request on the negotiable resource contains enough information
   about user agent capabilities and user preferences to allow the
   server to determine which alternate would be chosen if the reactive
   alternate selection algorithm outlined above were used by the user
   agent in reactive negotiation.

   When engaging in preemptive negotiation, the server must use the
   following algorithm, or any other algorithm that produces the same
   result, to construct the preemptive response message.

     1. Construct a request message on the best alternate resource by
        modifying the received request message on the negotiable
        resource in the following way.  First, the Request-URI and the
        Host request header must be rewritten to point to the best
        alternate resource.  Then, if there are any Alt-Header request
        headers that match the best alternate resource URI, the headers
        given in these matching Alt-Header request headers may be added
        to the headers in the request message.  Finally, the Alt-Header
        request headers in the request message may be removed.

     2. Generate a valid HTTP response message for the request message
        constructed in step 1.  If the server is a proxy, this may
        involve sending the constructed request to the origin server.

     3. Add two headers to the HTTP response message generated in step
        2.  These are an Alternates header describing the alternates
        bound to the negotiable resource, and a Location header that
        gives the URI of the best alternate resource.

   A preemptive response message satisfies the origin server restriction
   if and only if the full URI of the best alternate resource can be
   obtained by adding a sequence of characters excluding "/" to the end
   of the full URI of the negotiable resource, where the first character
   added may not be an US-ASCII uppercase or lowercase letter.

[##Note: In version 2 of this text, the origin server restriction was
much weaker: it only said that the two URIs must be located on the same
server.  I have changed this because a stronger restriction will make
the implementation and maintenance of origin servers simpler, while not
making life much more difficult for the authors of negotiable
resources.##]

[##Question to be resolved: should the origin server restriction be
weakened? Daniel DuBois proposes "The URLs must match up to the last
slash in the negotiable resource".##]

   Origin servers should not generate a preemptive response message that
   violates the origin server restriction.  If a client receives a
   preemptive response message that violates the origin server
   restriction directly from an origin server, then that client must
   reject the message as a probable spoofing attempt.  If the client is
   a proxy, it must not pass on the response, it can pass on a 502 (bad
   gateway) response instead.  Servers acting as proxies may generate
   preemptive responses that do violate the origin server restriction,
   and clients should not reject these responses.

[##Note: the origin server restriction does not imply that you can't
have alternates on other servers.  You can: you just have to generate
reactive negotiation responses for those variants.##]

   Clients, including caching proxies, may treat the HTTP response that
   can be derived from a reactive negotiation response by deleting the
   Alternates and Location headers as being controlled by the author of
   the best alternate resource, not the author of the negotiable
   resource on which the actual request was made.  It is the
   responsibility of the server to ensure that the best alternate
   resource author indeed has this control.  Section 6.1 discusses
   the implications of this rule on server design and administration.

   User agents can transmit information about their capabilities and
   preferences for a retrieval action using the various accept request
   headers.  If the accept headers present in a request on a negotiable
   resource contain enough information, a server may be able to generate
   a preemptive negotiation response.  As most resources will be
   un-negotiable, user agents are encouraged to send empty or small
   accept headers, or even omit some accept headers entirely, by
   default.  If a user agent knows or discovers that an origin server
   provides negotiated resources, it is encouraged to use data from the
   negotiated responses received so far to dynamically add or extend
   accept headers sent in future requests on resources provided by that
   origin server, in order to increase the probability that preemptive
   negotiation can be used instead of the slower reactive negotiation.

   Servers that want to support preemptive negotiation must use the
   preemptive alternate selection algorithm below.  This algorithm can
   be applied to determine

    o whether a preemptive negotiation response may be sent, and if so,
      which alternate is the best alternate

    o the appropriate response code, either 300 (Multiple Choices) or
      or 406 (None Acceptable), when a reactive response is sent.

   The algorithm uses the alternate descriptions for each of the
   available alternates, as will be included in the Alternates header of
   the response, and the Accept headers of the request on the negotiable
   resource as input.

   In the first step of the preemptive alternate selection algorithm,
   the overall quality for every alternate bound to the negotiable
   resource is computed.  The overall quality is a real number Q in the
   range 0 through 1, where 0 is the minimum and 1 the maximum value,
   defined as

      Q = qs * qe * qc * ql * q * qml

   The overall quality values computed in the preemptive algorithm are
   not necessarily equal to the overall quality values values computed
   in the reactive algorithm of Section 5.1.

   The values qs,qe,qc,ql,q,qml for a particular alternate are all
   determined using the alternate description of the particular
   alternate and the Accept headers of the request.

      qs   The source quality factor for the alternate is given by the
           source-quality attribute in the alternate description.

      qe   The encoding quality factor is 1 if there is no encoding
           attribute in the alternate description.  If there is an
           encoding attribute in the alternate description, the encoding
           quality factor is 1 if no Accept-Encoding header is present
           in the request, 1 if an Accept-Encoding header present
           indicates the ability to decode the given content encoding,
           and 0 otherwise.

      qc   The charset quality factor is 1 if there is no type attribute
           in the alternate description, or if the media type given in
           the type attribute of the alternate description does not have
           a charset parameter.  If there is a charset parameter, then
           the charset quality factor is 1 if the charset is US-ASCII, 1
           if no Accept-Charset header is present in the request, 1 if
           an Accept-Charset header present indicates the ability to
           handle the given character set, and 0 otherwise.

      ql The language quality factor is 1 if there is no language
           attribute in the alternate description.  If there is a
           language attribute, then the language quality factor is the
           highest quality factor assigned by the Accept-Language header
           in the request to any one of the languages listed in the
           attribute, 0 if none of the listed languages are assigned a
           quality factor by the Accept-Language header in the request,
           and 1 if there is no Accept-Language header in the request.

      q    The media type quality factor is 1 if there is no type           
           attribute in the alternate description.  If there is a type
           attribute, then the media type quality factor is the quality
           factor assigned to the given media type by the Accept headers
           in the request, 0 if the Accept headers do not assign a
           quality factor to the media type, and 1 if there are no
           Accept headers in the request.

      qml  The maximum length quality factor is 1 if there is no length
           attribute or no type attribute in the alternate description.
           If there is a length and a type attribute in the alternate
           description, then the maximum length quality factor is 0 if
           is the "mxb" value assigned to the given media type by the
           Accept headers in the request is less than the value given in
           the length attribute, 1 if the "mxb" value is greater or
           equal, 1 if the Accept headers do not assign an "mxb" value
           to the media type, and 1 if there are no Accept headers in
           the request.

   In the second step of the algorithm, the overall qualities of all
   alternates are compared to select the best one.  If there is one
   alternate with the highest overall quality value, then this is the
   best alternate.  If there are multiple alternates that share the
   highest overall quality value, then the alternate that is listed
   first in the Alternates header is the best alternate.

   If all alternates have an overall quality value of zero, then any
   reactive negotiation response sent must use the 406 (None Acceptable)
   response code.  Else, any reactive negotiation response sent should
   use the 300 (Multiple Choices) response code.

   In the third step of the preemptive negotiation alternate selection
   algorithm, it is determined whether a preemptive negotiation response
   may be sent to return the best alternate found.

   If the best alternate has an overall quality value of zero, then the
   server must not generate a preemptive response, it should generate a
   reactive response with the 406 (None Acceptable) response code.

   If the best alternate has an overall quality factor greater than
   zero, and no Accept header in the request contains a
   reactive-on-wildcard directive, then the server may generate a
   preemptive response, provided that the origin server restriction, if
   applicable, is met.

   If the best alternate has an overall quality factor greater than
   zero, and an Accept header in the request contains a
   reactive-on-wildcard directive, then the server may generate a
   preemptive response, provided that the origin server restriction, if
   applicable, is met, if

     o the type quality factor (q) of the best alternate was not derived
       from a match to a media range containing an asterisk "*" wildcard
       character in an Accept header, and

     o the language quality factor (ql) of the best alternate was not
       derived from a match to a "*" language-range in the
       Accept-Language header.

   In all other cases, the server must generate a reactive response.

5.3  Caching issues

   HTTP/1.1 does not provide a mechanism for conditional GET requests on
   negotiable resources, but does provide a mechanism, the Alt-Header
   request header, for conditional GET requests on alternate resources.

[## Question to be resolved: _should_ there be a special rule for
conditional GETS on negotiable resources?  Some people have said that
they worry about superfluous transmission of long Alternates headers. A
conditional GET could presumably save retransmission of a large
Alternates header.  We could define that preemptive and reactive
negotiation responses may omit the Alternates response header if it was
`not modified since'.##]

   When generating a 300 (Multiple Options) response, a 406 (None
   Acceptable) response, or the Alternates headers for a preemptive
   response, a cache may re-use an Alternates header received earlier
   from the negotiable resource, as long as the restrictions expressed
   by any cache-control directive in the Alternates header are met.  If
   the presence of an entity is required in a 300 or 406 response,
   caches may generate that entity on behalf of the origin server.

   When relaying a preemptive response, a cache may infer the request
   and response messages of the HTTP transaction on the best alternate
   resource performed by the server that generated the preemptive
   response, and may update its internal data structures to reflect the
   occurrence of this HTTP transaction.

   Caches are encouraged to perform such updates because they increase
   efficiency and prevent strange (but otherwise allowed) effects if the
   contents of an alternate resource are changed at the origin server
   while there is still a non-expired version of these contents in
   cache.


[##Note: earlier versions of the Alternates header had, besides the
{cache-control ...} directive, a {vary ...} directive.  My idea was that
{vary user-agent} in the Alternates header would indicate that the
source quality values in the Alternates header would vary on the
User-Agent field, thus allowing service authors to mix content
negotiation with user agent negotiation.  Varying the Alternates header
proved too controversial, so I threw the {vary ...} directive out.  This
means (as far as I can see) that _efficient_ negotiation on tables
vs. no tables, which also gives the user the option to select an other
alternate as in normal content negotiation, will only be possible after
we introduce feature negotiation.

The most efficient thing that works in one round trip for the normal
case and that still gives the user the option to select an other
alternate is using

    Alternates: {"plan.auto.html" 0.9 {type "text/html"}},
                   {description "Automatic tables/no tables selection"}},
                {"plan.tables.html" 0.8 {type "text/html"}},
                {"plan.notables.html" 0.7 {type "text/html"}},
                {"plan.dvi" 1.0 {type "text/x-dvi"}}

and making "plan.auto.html" an alternate resource that varies on user
agent.  A typical preemptive response would look like

  HTTP/1.1 200 OK
  Alternates: {"plan.auto.html" 0.9 {type "text/html"}},
                  {description "Automatic tables/no tables selection"}},
              {"plan.tables.html" 0.8 {type "text/html"}},
              {"plan.notables.html" 0.7 {type "text/html"}},
  Location: plan.auto.html
  Vary: user-agent
  Content-length: ....
  ....

  [contents of the plan.tables.html file on the server as the entity body]

The problem with this is that it leads to the storage of _four_ entity
bodies (instead of two) in a (full) cache:
 1) the variant entity with the tables produced by plan.auto.html,
 2) the variant entity without the tables produced by plan.auto.html,
 3) the one entity bound to plan.tables.html,
 4) the one entity bound to plan.notables.html.
So this doubles the traffic between the proxy and the origin server.

Note that this solution presupposes that the proxy cache can cache
varying resources efficiently, i.e. that we have a Variant-Set like
mechanism for preventing the unnecessary sending of variants already in
cache if a request from a previously unknown user agent is relayed.
Without that, even more traffic between the proxy and the origin server
is needed.
##]


6  Security and Privacy considerations

[##Note: This section could use some editing when it goes into the 1.1
draft.  To provide some motivation of changes to the current 1.1 draft,
I am including more text than would be required in an RFC.  Also, I have
not had the time to optimize readability of this section.##]

6.1  Spoofing using Location headers

   Clients, including caching proxies, may treat the HTTP response that
   can be derived from a reactive negotiation response by deleting the
   Alternates and Location headers as being controlled by the author of
   the best alternate resource, not the author of the negotiable
   resource on which the actual request was made.  It is the
   responsibility of the server to ensure that the best alternate
   resource author indeed has this control, because if this control is
   lost, control over the responses generated by direct requests on the
   best alternate resource is also lost.  Origin servers are helped
   carrying this responsibility by the rule that clients must reject
   preemptive responses that do not satisfy the origin server
   restrictions.

   This paragraph discusses the implications of the above on server
   design and administration.  First, it is intended that any negotiable
   resource authoring mechanism built into the server, and accessible to
   authors of static content and CGI scripts, generates preemptive
   responses by internally doing a request on the best variant resource,
   and adding the required Alternates and Location headers to the
   generated response.  Second, it is intended that, if the CGI
   interface has a feature that allows script authors to generate a
   preemptive response directly, then a) two distrusting parties will
   never be able to author CGI scripts in a shared directory, or b) use
   of this feature is only enabled for a CGI script if the script author
   is trusted by all other authors that use the same directory, or c)
   the server filters the Location headers generated by the CGI script
   to prevent spoofing that is not prevented by clients applying the
   origin server restriction.


6.2  User tracking based on accept headers

   If users fine-tune quality factors put into the default user agent
   accept headers to the third decimal, these accept headers can be used
   as relatively long-lived user identifiers, enabling content providers
   (even if they do not provide negotiable resources) to tell apart
   different users behind a proxy.  This identification allows content
   providers to do click-trail tracking, and allows collaborating
   content providers to match cross-server click-trails or form
   submissions of individual users.  Thus, privacy reasons demand that
   user agents are conservative in the amount of quality factor fine
   tuning they allow to users without giving a warning about privacy and
   in the sending of long accept headers by default in a request.  (See
   also the remarks on sending short accept headers for performance
   reasons in Section 5.2).


6.3  Accept headers revealing information of private nature
     without real need.

[##Note: Brian Behlendorf has commented that the discussion in two
paragraphs below is way too long for the draft 1.1 standard.  I agree, I
made it this long to justify my new Accept-Language: "*" feature.##]

   Preferences sent in accept headers, in particular language quality
   factors sent in Accept-Language headers, may reveal information that
   the user rather keeps private unless it will directly improve the
   quality of the service.  The content negotiation mechanism allows
   users to leave some languages (e.g. languages the knowledge of which
   strongly correlates with membership of a particular ethnic group) out
   of the Accept-Language header without decreasing the quality of the
   negotiation process if the request happens to be on a negotiable
   resource.  Note however that the speed of the negotiation process may
   be affected.

   No matter how much information is left out of the Accept headers,
   automatic reactive negotiation by a user agent on a negotiable
   resource will inevitably reveal some of the user preferences by the
   generation of a request on the best alternate resource as partly
   determined by the user preferences. Malicious service authors could
   provide `fake' negotiable resources, which not even bind to alternate
   resources that are in fact different, whose only purpose is to get
   information about (ethnicity correlated) languages understood by the
   visiting users.  Such plots would however be visible to alert
   victims, as user agents will allow the user to review a list of all
   alternates bound to the negotiable resource.

   Maintainers of firewall proxies may want to process outgoing accept
   headers to enhance privacy beyond the level provided by the user
   agents behind the firewall.


7  Acknowledgments

   This document builds on the content negotiation descriptions in [1],
   and directly incorporates text from [1] in some places.  Many members
   of the HTTP working group have contributed to discussions that are
   reflected in this document.


8  References

   [1]  Roy T. Fielding, Henrik Frystyk Nielsen, and Tim Berners-Lee.
        Hypertext Transfer Protocol -- HTTP/1.1.  Internet-Draft
        draft-ietf-http-v11-spec-01.txt, HTTP Working Group, January,
        1996.

   [2]  H. Alvestrand. "Tags for the identification of languages." RFC
        1766, UNINETT, March 1995.
Received on Sunday, 25 February 1996 14:46:39 UTC