- From: Koen Holtman <koen@win.tue.nl>
- Date: Sun, 25 Feb 1996 23:41:30 +0100 (MET)
- To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
- Cc: Koen Holtman <koen@win.tue.nl>
New content negotiation sections
================================
Koen Holtman, koen@win.tue.nl
version 2, 31 Jan 1996
version 3, 22 Feb 1996
version 4, 25 Feb 1999
0 Purpose of this document
This document proposes a content negotiation mechanism for HTTP/1.1.
It contains a number of sections that should be read as definitions
in the context of the current draft HTTP/1.1 specification [1]. It
is intended that these sections are merged into a future version of
the draft HTTP/1.1 specification.
This document reflects the consensus of the content negotiation
subgroup, as I perceive it now. (But note that the content
negotiation subgroup also has consensus on some things not covered in
this document.) It also contains some elements the content
negotiation subgroup has not discussed, or for which only `consensus
by the absence of replies' was reached. Issues that still need to be
resolved are marked as such.
I am posting this document to the entire workgroup so that we can
start converging on a version that reflects the consensus of the
entire workgroup. Please send comments to the http-wg mailing list.
Version 3 of this document was submitted as an internet draft with
the name draft-holtman-http-content-negotiation-00.txt. Changes with
respect to version 3 are listed below. Earlier versions of this
document can be found in the content negotiation subgroup mail
archives: <URL:http://www.organic.com/public/conneg/mail/>.
1 Introduction
Content negotiation, as proposed in this document, is an optional
feature for the HTTP/1.1 protocol: resources may be negotiable, but
they need not be. If a resource is negotiable, this changes the
semantics of GET and HEAD transactions on the resource. Other
transactions are not affected.
A negotiable resource has a number of alternates bound to it. The
proposed content negotiation mechanism allows for automatic selection
of the preferred alternate bound to a negotiable resource based on
the properties of the alternates and on the user agent preferences
for the retrieval action.
This document builds on the content negotiation descriptions in [1],
and directly incorporates text from [1] in some places. A new
directive, reactive-on-wildcard, is introduced to allow user agents
to signal the capability of doing content negotiation. If this
directive is absent, the proposed definitions produce server
behavior that yields adequate results for (HTTP/1.0) user agents
that do not support content negotiation.
2 Terminology and notation
This document uses the terminology and notational conventions defined
in [1]. It sometimes refers directly to sections in [1], using the
notation `Section (1.2[1])'. If a (sub)section title below is marked
with (*), is intended as a replacement for the (sub)section with the
same title in [1]. All other (sub)sections below, up to Section 7,
contain new material intended as an addition to [1].
The text blocks marked with ## signs are comments. Some of them will
be removed in later versions of this document, others may be kept
until the last version, but should be removed when text is taken from
this document and put into a HTTP 1.1 draft.
Some of the new response header and field names defined here are very
long, I would be happy with alternative names that are shorter, and
expect that the eventual 1.1 draft will indeed shorten some names
defined here.
[##
Changes from version 2 to version 3 (internet draft)
- Took out the [## ... ##] comments
- Added internet draft headers
- Re-numbered sections, added introduction
- Strengthened the restrictions that partially prevent spoofing using
Location headers.
- Added rewrites of the Accept-* header sections bases on consensus in
the content negotiation sub-wg.
- Added q, ql, .. factor computations rules in the preemptive
negotiation section.
- Changed the word `variant' to `alternate' to eliminate a terminology
clash with the Vary header.
- Changed the word `virtual' to `derived'.
- Split the 1.1-00 URI header into a smaller URI header with only
"mirror" and "name", and a new Alternates header.
- Changed Variant-If-Modified-Since mechanism into the more general
Rep-Header mechanism, to account for If-validator-valid,
Variant-Set, Cache-Control, and whatever we come up with next.
- Changed the rules on when to include an entity in a 300 or 406
response, simplified the rules on when such responses may be
generated.
- Rewrote the caching section
- Rewrote the security section and added text about privacy issues
- Split the 1.1-00 406 (None acceptable) response code into a new 406
(None acceptable) response code and a new 408 (Not acceptable)
response code.
- Several minor edits
##]
[##
Changes from version 3 (internet draft) to version 4:
- Put back most of the comments from version 2
- Took out internet draft headers
- Added more comments
- Added rule that proxies may not negotiate based on Alternates
headers with attributes they do not understand.
- Several minor edits
##]
[## Question to be resolved: Should a rudimentary feature negotiation
facilities that work for 90% of the cases be added as a stopgap?? I
wonder if we won't be doing the web community a disservice if we delay
a 90% solution in order to construct a 99% solution for HTTP 1.2.
After all, most negotiation that happens now is on tables vs. no
tables, not on language or MIME type.
##]
3 Status code definitions
3.1 Redirection 3xx
300 Multiple Choices (*)
The requested resource is a negotiable resource and the server is
engaging in reactive content negotiation (Section 5). The server has
determined that multiple alternates are acceptable, but is not able
to determine which alternate is the best alternate. This response
may only be generated if specific conditions given in Section 5.2 are
met. The response must include an Alternates header describing the
alternates bound to the resource, allowing a user agent to
automatically select and retrieve an alternate if appropriate.
This response is cachable, subject to the restrictions specified in
the cache-control directive, if present, of the included Alternates
header.
If no Accept header in the request contains a reactive-on-wildcard
directive, and it was not a HEAD request, the response must include
an entity that gives the user the option to select the most
appropriate alternate manually. The suggested entity media type as
given in the Content-Type response header is "text/html". If there
is a reactive-on-wildcard directive, no entity should be included.
[## Note: This `no entity should be included' rule is for saving
bandwidth. It is expected that clients that add reactive-on-wildcard
directives are always able to give the user the option to select the
most appropriate alternate manually, using only the Alternates
header.##]
If the service author finds it appropriate for any user agent that
does not implement an alternate selection algorithm to automatically
retrieve a certain alternate, then a Location response header giving
the URI of that alternate may be included in the response.
9.4 Client Error 4xx
406 None Acceptable (*)
The requested resource is a negotiable resource and the server is
engaging in reactive content negotiation (Section 5). Usually, this
response indicates that the server was not able to positively
determine that at least one of the available alternates would be
acceptable. The response must include an Alternates header
describing the alternates bound to the resource, allowing a user
agent to automatically select and retrieve an alternate if
appropriate.
This response is cachable, subject to the restrictions specified in
the cache-control directive, if present, of the included Alternates
header.
If no Accept header in the request contains a reactive-on-wildcard
directive, and it was not a HEAD request, the response must include
an entity that gives the user the option to select the most
appropriate alternate manually. The suggested entity media type as
given in the Content-Type response header is "text/html". If there
is a reactive-on-wildcard directive, no entity should be included.
If the service author finds it appropriate for any user agent that
does not implement an alternate selection algorithm to automatically
retrieve a certain alternate, then a Location response header giving
the URI of that alternate may be included in the response.
408 Not Acceptable (*)
The resource identified by the Request-URI has content
characteristics that are not acceptable according to the accept
headers sent in the request. This response code must only be
generated by un-negotiable resources.
3 Protocol parameter descriptions
3.1 Language Tags (*)
[##Note: I deleted the language tag matching discussion that used to be
in this Section to Section 10.4 (Accept-Language) No other edits were
made.##]
A language tag identifies a natural language spoken, written, or
otherwise conveyed by human beings for communication of information
to other human beings. Computer languages are explicitly excluded.
HTTP uses language tags within the Accept-Language, Content-Language,
and Alternates fields.
The syntax and registry of HTTP language tags is the same as that
defined by RFC 1766 [2]. In summary, a language tag is composed of 1
or more parts: A primary language tag and a possibly empty series of
subtags:
language-tag = primary-tag *( "-" subtag )
primary-tag = 1*8ALPHA
subtag = 1*8ALPHA
Whitespace is not allowed within the tag and all tags are
case-insensitive. The namespace of language tags is administered by
the IANA. Example tags include:
en, en-US, en-cockney, i-cherokee, x-pig-latin
where any two-letter primary-tag is an ISO 639 language abbreviation
and any two-letter initial subtag is an ISO 3166 country code.
4 Header field definitions
4.1 Accept (*)
[## Note: I did a rewrite of this section, which also involved
deleting some remarks about things that are better said in Section
5.##]
The Accept request-header field can be used to specify certain media
types which are acceptable for the response. Accept headers can be
used to guide content negotiation (Section 5), and can also be used
to indicate that the request is specifically limited to a small set
of desired types, as in the case of a request for an in-line image.
In general, it is not efficient to send long Accept headers in every
request. See Section 5.2 for a discussion of Accept header
efficiency considerations.
The field may be folded onto several lines and more than one
occurrence of the field is allowed, with the semantics being the same
as if all the entries had been in one field value.
Accept = "Accept" ":" #(
( media-range
[ ";" "q" "=" qvalue ]
[ ";" "mxb" "=" 1*DIGIT ] )
| reactive-on-wildcard )
media-range = ( "*/*"
| ( type "/" "*" )
| ( type "/" subtype )
) *( ";" parameter )
reactive-on-wildcard = "reactive-on-wildcard" | "r-o-w"
The asterisk "*" character is used to group media types into ranges,
with "*/*" indicating all media types and "type/*" indicating all
subtypes of that type.
The parameter q is used to indicate the media type quality factor,
which represents the user's preference for that range of media
types. The parameter mxb gives the maximum acceptable size of the
Entity-Body, in decimal number of octets, for that range of media
types. The default values are: q=1 and mxb=undefined (i.e.,
infinity). Section 5 describes the content negotiation algorithm
which makes use of these values.
The example
Accept: audio/*; q=0.2, audio/basic
should be interpreted as "I prefer audio/basic, but send me any audio
type if it is the best available after an 80% mark-down in quality."
If no Accept header is present, then it is assumed that the client
accepts all media types. If Accept headers are present, and if the
resource is an un-negotiable resource which cannot generate a
response which is acceptable according to the Accept headers, then
the server should generate an error response with the 408 (not
acceptable) status code.
A more elaborate example is
Accept: text/plain; q=0.5, text/html,
text/x-dvi; q=0.8; mxb=100000, text/x-c
Verbally, this would be interpreted as "text/html and text/x-c are
the preferred media types, but if they do not exist, then send the
text/x-dvi entity if it is less than 100000 bytes, otherwise send the
text/plain entity."
Media ranges can be overridden by more specific media ranges or
specific media types. If more than one media range applies to a given
type, the most specific reference has precedence. For example,
Accept: text/*, text/html, text/html;version=2.0, */*
have the following precedence:
1) text/html;version=2.0
2) text/html
3) text/*
4) */*
The media type quality factor and maximum acceptable size associated
with a given type are determined by finding the media range with the
highest precedence which matches that type.
For example,
Accept: text/*;q=0.3, text/html;q=0.7, text/html;version=2.0,
*/*;q=0.5
would cause the following type quality factors to be associated:
text/html;version=2.0 = 1
text/html = 0.7
text/plain = 0.3
image/jpeg = 0.5
text/html;level=3 = 0.7
The inclusion of a reactive-on-wildcard directive in an Accept header
will change the rules for the sending of reactive negotiation
responses (Section 5). The example
Accept: text/html; */*;q=0.95, r-o-w
should be interpreted as "text/html is my preferred media type, and I
assign media type quality factors in the range 0 - 0.95 to all other
media types. Send me a reactive negotiation response, so that I can
pick the best alternate myself, if you have any non-text/html
alternate which might give me a higher overall quality than any
text/html alternate."
Note: A user agent may be provided with a default set of
quality values for certain media ranges. However, unless the
user agent is a closed system which cannot interact with
other rendering agents, this default set should be
configurable by the user.
4.2 Accept-Charset (*)
The Accept-Charset request-header field can be used to indicate what
character sets are acceptable for the response. This field allows
clients capable of understanding more comprehensive or
special-purpose character sets to signal that capability to a server
which is capable of representing documents in those character
sets. The US-ASCII character set can be assumed to be acceptable to
all user agents.
Accept-Charset = "Accept-Charset" ":" 1#charset
Character set values are described in Section (3.4[1]). An example is
Accept-Charset: iso-8859-1, unicode-1-1
If no Accept-Charset header is present, the default is that any
character set is acceptable. If an Accept-Charset header is present,
and if the resource is an un-negotiable resource which cannot
generate a response which is acceptable according to the
Accept-Charset header, then the server should generate an error
response with the 408 (not acceptable) status code.
4.3 Accept-Encoding (*)
The Accept-Encoding request-header field is similar to Accept, but
restricts the content-coding values (Section (3.5[1])) which are
acceptable in the response.
Accept-Encoding = "Accept-Encoding" ":"
#( content-coding )
An example of its use is
Accept-Encoding: compress, gzip
If no Accept-Encoding header is present in a request, the server may
assume that the client will accept any content coding. If an
Accept-Encoding header is present, and if the resource is an
un-negotiable resource which cannot generate a response which is
acceptable according to the Accept-Encoding header, then the server
should generate an error response with the 408 (not acceptable)
status code.
4.4 Accept-Language (*)
The Accept-Language request-header field is similar to Accept, but
restricts the set of natural languages that are preferred as a
response to the request.
Accept-Language = "Accept-Language" ":"
1#( language-range [ ";" "q" "=" qvalue ] )
language-range = ( ( 1*8ALPHA *( "-" 1*8ALPHA ) )
| "*" )
Each language-range may be given an associated quality value which
represents an estimate of the user's comprehension of the languages
specified by that range. The quality value defaults to "q=1" (100%
comprehension). This value may be used in the server's content
negotiation algorithm (Section 5). For example,
Accept-Language: da, en-gb;q=0.8, en;q=0.7
would mean: "I prefer Danish, but will accept British English (with
80% comprehension) and other types of English (with 70%
comprehension)."
A language-range matches a language-tag if it exactly equals the tag,
or if it is a prefix of the tag such that the first tag character
following the prefix is "-". The special range "*", if present in
the Accept-Language field, matches every tag not matched by any other
ranges present in the Accept-Language field.
Note: This use of a prefix matching rule does not imply that
language tags are assigned to languages in such a way that it is
always true that if a user understands a language with a certain
tag, then this user will also understand all languages with tags
for which this tag is a prefix. The prefix rule simply allows
the use of prefix tags if this is the case.
The language quality factor assigned to a language-tag by the
Accept-Language field is the quality value of the longest
language-range in the field that matches the language-tag. If no
language-range in the field matches the tag, the language quality
factor assigned is 0.
If no Accept-Language header is present in a request, the server
should assume that all languages are equally acceptable. If an
Accept-Language header is present, then all languages which are
assigned a quality factor greater than 0 are acceptable. If the
resource is an un-negotiable resource which cannot generate a
response for an audience capable of understanding at least one
acceptable language, it is acceptable to serve a response that uses
other languages.
It may be contrary to be privacy expectations of the user to send an
Accept-Language header with the complete linguistic preferences of
the user in every request. For a complete discussion of this issue,
see Section 6.3. If a reactive-on-wildcard directive is present in
an Accept header, the user agent can safely omit certain languages
intelligible to the user from the Accept-Language header, without
affecting the quality of the negotiation process in requests on
negotiated resources, if the language-range "*" is included with an
appropriate language quality factor,
Note: As intelligibility is highly dependent on the
individual user, it is recommended that client applications
make the choice of linguistic preference available to the
user. If the choice is not made available, then the
Accept-Language header field must not be given in the
request.
[#### Issue to be resolved: the 1.1-00 spec has a sentence in this
section that says:
"If the server cannot fulfill the request with one or more of the
languages given, or if the languages only represent a subset of a
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
multi-linguistic Entity-Body, [....]"
^^^^^^^^^^^^^^^^
According to this sentence, an entity body can use multiple languages,
all of which need to be understood by the sender of the
Accept-Language header, so the document would in fact be for a
multi-linguistic audience. But in Section 10.11 (Content-Language)
the 1.1-00 spec states:
Multiple languages may be listed for content that is intended for
multiple audiences. For example, a rendition of the "Treaty of
^^^^^^^^^^^^^^^^^^
Waitangi," presented simultaneously in the original Maori and
English versions, would call for
Content-Language: mi, en
However, just because multiple languages are present within an
entity does not mean that it is intended for multiple linguistic
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
audiences. An example would be a beginner's language primer, such
^^^^^^^^^
as "A First Lesson in Latin," which is clearly intended to be used
by an English-literate audience. In this case, the Content-Language
should only include "en".
There seems to be an internal contradiction here, the text above
states that content can never be designated as being for a
multi-linguistic audience, it can only be designated as being for
multiple linguistic-audiences.
So should HTTP use "multi-linguistic audiences" or "multiple
linguistic-audiences"? In this Accept-Language section, I use
"multiple linguistic-audiences".
####]
4.5 URI (*)
The URI entity-header field is used to inform the recipient of other
Uniform Resource Identifiers (Section (3.2[1])) by which the resource
can be identified.
URI-header = "URI" ":" 1#( uri-mirror | uri-name )
uri-mirror = "{" "mirror" <"> URI <"> "}"
uri-name = "{" "name" <"> URI <"> "}"
Any URI specified in this field can be absolute or relative to the
Request-URI. The "mirror" form of URI refers to a location which is a
mirror copy of the Request-URI. The "name" form refers to a
location-independent name corresponding to the Request-URI.
[## Side issue: I find that the "mirror" and "name" descriptions above
do not give enough information to let me know what they are supposed
to mean. I understand that the semantics come from current practice
in the CERN server. Anyone care to expand these descriptions?##]
4.6 Alternates
The Alternates entity-header field is used to describe the alternate
resources bound to a negotiable resource.
Alternates = "Alternates" ":" 1#( alternate-descr
| caching-directive )
alternate-descr =
"{" <"> URI <"> source-quality
[ "{" "type" <"> media-type <"> "}" ]
[ "{" "language" <"> 1#language-tag <"> "}" ]
[ "{" "encoding" <"> 1#content-coding <"> "}" ]
[ "{" "length" 1*DIGIT "}" ]
[ "{" "description" quoted-string "}" ]
[ extension-attribute ]
"}"
source-quality = qvalue
extension-attribute = "{" extension-name extension-value "}"
extension-name = token
extension-value = #( token | quoted-string
| <any element of tspecials except "}"> )
Note: the extension-attribute is included because it is
expected that HTTP/1.2 will define new attributes for use in
the Alternates header. Also, this attribute eases content
negotiation experiments under HTTP/1.1.
caching-directive = "{" "cache-control" 1#cache-directive "}"
Cache-directives are defined in Section (10.8[1]).
[##Issue to be resolved: Would just having the max-age cache-directive
here be sufficient?##]
[##Note: If Age: goes into HTTP/1.1 for caching of normal responses,
we need to add optional age field to the URI header##]
Any URI specified in this field can be absolute or relative to the
Request-URI. For each of the alternates bound to the negotiable
resource, the alternates header must include an alternate-descr
form describing that alternate.
[##Note: If the resource author cannot or does not want to list all the
alternates, Vary header based negotiation can be used##]
[## Question to be resolved: should text below up to the example be
moved to Section (3.9[1]) (Quality Values)?##]
The source-quality attribute given in an alternate description is
measured by the content provider as representing the amount of
degradation from the original source. For example, a picture
originally in JPEG form would have a lower source quality when
translated to the XBM format, and much lower source quality when
translated to an ASCII-art alternate. Note, however, that this is a
function of the source -- an original piece of ASCII-art may degrade
in quality if it is captured in JPEG form.
Content providers should use the following table as a guide when
assigning source quality values:
1.000 no degradation
0.999-0.900 no noticeable degradation
0.899-0.700 noticeable, but acceptable degradation
0.699-0.500 barely acceptable degradation
0.499-0.000 unacceptable degradation
[##Question to be resolved: can we come up with a word other than
`degradation' that also covers the case of alternates not converted
from one source?##]
It is important that content providers do not assign very low source
quality values without good reason, as this will limit the ability of
users to influence the negotiation process with their own preference
settings.
If alternates are not converted from one source, but constructed
separately to represent the same abstract information in different
ways, then the source quality attributes can be used to express
differences in quality between the alternates.
An example Alternates header for a negotiable resource with the URI
http://www.w3.org/pub/WWW/TheProject is:
Alternates: {"TheProject.fr.html" 1.0
{type "text/html"} {language "fr"}},
{"TheProject.en.html" 1.0
{type "text/html"} {language "en"}},
{"TheProject.fr.txt" 0.7
{type "text/plain"} {language "fr"}},
{"TheProject.en.txt" 0.8
{type "text/plain"} {language "en"}}
which indicates that the negotiable resource binds to four alternate
resources that differ in media type and natural language.
The type, language, encoding, and length attributes of an alternate
description refer to their Content-* header counterparts. Though all
attributes are optional, it is often desirable to include as many
attributes as possible as this will increase the quality of the
negotiation process. Servers must only generate extension-attributes
whose names start with "x-". Clients should ignore all extension
attributes they do not recognize. Proxies should not engage in
alternate selection calculations on behalf of the origin server if an
unrecognized attribute is present in the Alternates header.
The description attribute is meant to provide a textual description
of some properties of the alternate, to be displayed by a user agent
when showing the list of all alternates bound to a negotiable
resource (see Section 5). This attribute can be included if the URI
and normal attributes of an alternate are considered too opaque to
allow interpretation by the user.
The cache-control directive of the Alternates header field can be
used to restrict the cachability of the Alternates header, and, for
300 (multiple choices) and 406 (none acceptable) responses, the other
parts of the response. This directives duplicates the control
functionality offered for un-negotiated resources by the
Cache-Control header.
[## Issue to be resolved: Should there be a
{"user-agent-prefix" quoted-string}
attribute which could be used for user agent negotiation? The
matching rule could amount to: if you match a user-agent-prefix in an
alternate, exclude all other alternates with user-agent prefix
attributes that provide no, or shorter, matches from consideration.
Example:
Alternates: {"plan.html" 0.9
{type "text/html"} {user-agent-prefix ""}},
{"plan.wuxta.html" 0.6 {type "text/html"}
{user-agent-prefix "WuxtaWeb1."}
{description "Does not trigger bug in WuxtaWeb 1.x"}},
{"plan.dvi" 1.0 {type "text/x-dvi"}},
{cache-control max-age=1209600}
#]
[## Note: adding feature negotiation would add a "feature" attribute
in the alternates-descr syntax, and a corresponding Accept-feature
request header. The attribute would contain feature identifiers, which
are short codes for things like `user agent supports HTML 3.0 tables',
`user agent supports java', and maybe the negations of feature
identifiers. ##]
4.7 Alt-Header
The Alt-Header request-header can be used in requests to negotiable
resources to introduce new request headers in any derived requests on
alternate resources (see section 5.2).
Alt-Header = "Alt-Header" ":" <"> URI <"> Request-Header
The URI specified in this field can be absolute or relative to the
Request-URI. A typical example is
Alt-Header: "TheProject.en.html" If-Validator-Valid: 6a7bf
If it already has a copy of the "TheProject.en.html" alternate in
cache, a caching client can include this header in requests to allow
the server to shorten a 200 (OK) preemptive negotiation response to a
304 (not Modified) response in case that preemptive negotiation
yields "TheProject.en.html" as the best alternate.
Servers are always allowed to ignore Alt-Header request headers.
[##Note: Roy Fielding has proposed a Content-ID response header which
would carry validators guaranteed to be
1) different for different resources and
2) different for different resource versions.
If we have such a header, and it is generally used, then we can simplify
Rep-Header to
Unless-ID = "Unless-ID" ":" 1#cid
with the meaning: send me a normal response unless the Content-ID would
be one of the listed Content-IDs. If the Content-ID is one of the
listed ones, return a 4xx (Unless true) response instead. The same
Unless-ID would also serve as a simplification of the
"If-Validator-Valid" and "Variant-Set" proposed in the caching subgroup.
##]
5 Content negotiation (*)
Content negotiation is an optional feature of the HTTP/1.1 protocol:
resources may be negotiable, but they need not be. If a resource is
negotiable, this changes the semantics of GET and HEAD transactions
on the resource. Other transactions are not affected.
A negotiable resource has a number of alternates bound to it. The
HTTP content negotiation mechanism allows for automatic selection of
the preferred alternate bound to a negotiable resource based on the
properties of the alternates and on the user agent preferences for
the retrieval action on the negotiated resource.
[## Note: `retrieval action' is a new term I had to introduce because
`request' is not entirely accurate here: with reactive negotiation,
one retrieval action causes two requests. ##]
An alternate is a resource, identified by an alternate URI, that
provides one possible representation of the `contents' of the
negotiable resource. An alternate resource must never be a
negotiable resource itself. It is the responsibility of the author
of the negotiable resource, not the author of the alternate, to
ensure that this restriction is not violated.
The negotiability of a resource is expressed by the Alternates
response header. If a 2xx or 3xx class response does not include an
Alternates response header, then the resource is un-negotiable. If
any response does include an Alternates response header, then the
resource is negotiable.
When displaying an alternate as the end result of a retrieval action
on a negotiable resource, a user agent should allow the user to
review a list of all alternates bound to the negotiable resource, and
to initiate retrieval of another alternate if desired. The list can
be annotated with some or all of the properties of the alternates, as
given by the Alternates header in the negotiable resource response.
When displaying an alternate as the end result of a retrieval action
on a negotiable resource, a user agent should show the negotiable
resource URI, not the alternate resource URI, as being the URI the
contents of which were retrieved. If the user agent stores a
reference to the content displayed for future use, it is the the
negotiable resource URI, not the alternate resource URI, which should
be stored.
HTTP/1.1 provides for two types of content negotiation: preemptive
and reactive. Preemptive negotiation is generally faster than
reactive negotiation, but it can only be used if sufficient
information about user agent capabilities and user preferences is
present in the request on the negotiable resource. Reactive
negotiation can always be used. Therefore, preemptive negotiation is
best seen as mechanism that can sometimes optimize on reactive
negotiation transactions.
5.1 Reactive negotiation
In reactive negotiation, the selection and retrieval of an alternate
bound to the negotiable resource spans two transactions. In the
first transaction, the client transmits a request on the negotiable
resource URI, and the server responds with a 300 (multiple choices)
or 406 (none acceptable) response, which includes an alternates
header describing the alternates bound to the negotiable resource. A
406 response may always be generated, a 300 response may only be
generated if specific conditions given in Section 5.2 are met. The
client can use the Alternates header in the 300 or 406 response to
select the alternate that matches best to the preferences for the
retrieval action.
In the second transaction, the user agent transmits a request on the
URI of the selected alternate resource, and the server will typically
respond with a 200 (OK) response, though other response codes like
302 (moved temporarily) are also possible. Only the user agent needs
to know that the second request is part of a reactive negotiation
process, all other parties can treat it as a normal request on an
un-negotiated resource.
User agents should use the reactive alternate selection algorithm
below when automatically selecting the best alternate listed in an
alternates header. User agents are allowed to use other selection
algorithms, but this is not recommended, as preemptive negotiation is
defined to optimize the case in which the reactive alternate
selection algorithm below is used.
User agents that do not wish to implement an alternate selection
algorithm can, by only using Accept request headers of a certain
form, force servers to always include an entity when a reactive
negotiation response is sent. They can then use this entity to allow
the user to select an alternate manually, or use the reactive
response Location header, if present, to automatically fetch the
alternate recommended by the server.
[##Note: the possibility of doing the above is also important for
proxies that want to mediate between a 1.0 client and a 1.1 server.
1.0 clients will always use Accept headers of the certain form that
triggers a response suitable for a client which does not implement
negotiation.##]
In the first step of the reactive alternate selection algorithm, the
overall quality for every alternate listed in the Alternates header
of the negotiable resource is computed. The overall quality of an
alternate is a real number Q in the range 0 through 1, where 0 is the
minimum and 1 the maximum value, defined as
Q = qs * qe * qc * ql * q * qml
The values qs,qe,qc,ql,q,qml for a particular alternate are all
determined using the part of the received Alternates header
describing that alternate, called the alternate description below.
qs The source quality factor for the alternate is given by the
source-quality attribute in the alternate description.
qe The encoding quality factor is 1 if there is no encoding
attribute in the alternate description. If there is an
encoding attribute in the alternate description, the encoding
quality factor is 1 if the user agent can decode the given
content encoding, 0 otherwise.
[##Question to be resolved: do we really want to distinguish between
alternates that have an encoding and alternates that do not? This could
block a smooth transition to a scheme in which servers apply compression
on the fly if the client indicates it can handle decompression. Maybe
negotiation about en/decoding capabilities should be kept separate from
the main content negotiation mechanism. On the other hand, the
Transfer-Encoding header already seems to allow for a future
introduction of on the fly compression##]
qc The charset quality factor is 1 if there is no type attribute
in the alternate description, or if the media type given in
the type attribute of the alternate description does not have
a charset parameter. If there is a charset parameter, then
the charset quality factor is 1 if the user agent can process
a message with the given character set, 0 otherwise. User
agents must always be able to process a message with the
US-ASCII charset.
[## Question to be resolved: do recent discussions on the http-wg list
indicate that the US-ASCII above should be changed into ISO-8859-1? Or
should the text above be changed to say `US-ASCII or ISO-8859-1'? I
believe the consensus was 'no'.##]
ql The language quality factor is 1 if there is no language
attribute in the alternate description. If there is a
language attribute, then the language quality factor is the
highest quality factor assigned to any one of the listed
languages according to the user agent language preferences
for the retrieval action.
[## Note: the 1.1-01 draft says: `If at least one alternate has an
assigned content language, but the one currently under consideration
does not, then it should be assigned the value "ql=0.5".' I deleted
this requirement, service authors can more accurately use the qs
attribute to adjust things in situations where only some of the
alternates have languages##]
q The media type quality factor is 1 if there is no type
attribute in the alternate description. If there is a type
attribute, then the media type quality factor is the quality
factor assigned to the given media type in in the user agent
media type preferences for the retrieval action.
qml The maximum length quality factor is 1 if there is no length
attribute in the alternate description. If there is a length
attribute in the alternate description, then the maximum
length quality factor is 1 if the length given is less than
or equal to the maximum acceptable length according to the
user agent maximum length preferences for the retrieval
action, 0 otherwise. Preferred maximum lengths are often
equal to `infinity'.
In the second step of the the reactive alternate selection algorithm,
the overall qualities of all alternates are compared to select the
best alternate. If there is one alternate with the highest overall
quality value, then that alternate is the best alternate. If there
are multiple alternates that share the highest overall quality value,
then the alternate that is listed first in the received Alternates
header is the best alternate.
If all alternates have an overall quality value of zero, a user agent
should not automatically retrieve the first alternate, but stop the
reactive negotiation process, allowing the user to decide on the next
action.
5.2 Preemptive negotiation (*)
In preemptive negotiation, the selection and retrieval of an
alternate bound to the negotiable resource is done in a single
transaction, saving one round trip time over reactive negotiation. A
preemptive negotiation response must only be generated by a server if
the request on the negotiable resource contains enough information
about user agent capabilities and user preferences to allow the
server to determine which alternate would be chosen if the reactive
alternate selection algorithm outlined above were used by the user
agent in reactive negotiation.
When engaging in preemptive negotiation, the server must use the
following algorithm, or any other algorithm that produces the same
result, to construct the preemptive response message.
1. Construct a request message on the best alternate resource by
modifying the received request message on the negotiable
resource in the following way. First, the Request-URI and the
Host request header must be rewritten to point to the best
alternate resource. Then, if there are any Alt-Header request
headers that match the best alternate resource URI, the headers
given in these matching Alt-Header request headers may be added
to the headers in the request message. Finally, the Alt-Header
request headers in the request message may be removed.
2. Generate a valid HTTP response message for the request message
constructed in step 1. If the server is a proxy, this may
involve sending the constructed request to the origin server.
3. Add two headers to the HTTP response message generated in step
2. These are an Alternates header describing the alternates
bound to the negotiable resource, and a Location header that
gives the URI of the best alternate resource.
A preemptive response message satisfies the origin server restriction
if and only if the full URI of the best alternate resource can be
obtained by adding a sequence of characters excluding "/" to the end
of the full URI of the negotiable resource, where the first character
added may not be an US-ASCII uppercase or lowercase letter.
[##Note: In version 2 of this text, the origin server restriction was
much weaker: it only said that the two URIs must be located on the same
server. I have changed this because a stronger restriction will make
the implementation and maintenance of origin servers simpler, while not
making life much more difficult for the authors of negotiable
resources.##]
[##Question to be resolved: should the origin server restriction be
weakened? Daniel DuBois proposes "The URLs must match up to the last
slash in the negotiable resource".##]
Origin servers should not generate a preemptive response message that
violates the origin server restriction. If a client receives a
preemptive response message that violates the origin server
restriction directly from an origin server, then that client must
reject the message as a probable spoofing attempt. If the client is
a proxy, it must not pass on the response, it can pass on a 502 (bad
gateway) response instead. Servers acting as proxies may generate
preemptive responses that do violate the origin server restriction,
and clients should not reject these responses.
[##Note: the origin server restriction does not imply that you can't
have alternates on other servers. You can: you just have to generate
reactive negotiation responses for those variants.##]
Clients, including caching proxies, may treat the HTTP response that
can be derived from a reactive negotiation response by deleting the
Alternates and Location headers as being controlled by the author of
the best alternate resource, not the author of the negotiable
resource on which the actual request was made. It is the
responsibility of the server to ensure that the best alternate
resource author indeed has this control. Section 6.1 discusses
the implications of this rule on server design and administration.
User agents can transmit information about their capabilities and
preferences for a retrieval action using the various accept request
headers. If the accept headers present in a request on a negotiable
resource contain enough information, a server may be able to generate
a preemptive negotiation response. As most resources will be
un-negotiable, user agents are encouraged to send empty or small
accept headers, or even omit some accept headers entirely, by
default. If a user agent knows or discovers that an origin server
provides negotiated resources, it is encouraged to use data from the
negotiated responses received so far to dynamically add or extend
accept headers sent in future requests on resources provided by that
origin server, in order to increase the probability that preemptive
negotiation can be used instead of the slower reactive negotiation.
Servers that want to support preemptive negotiation must use the
preemptive alternate selection algorithm below. This algorithm can
be applied to determine
o whether a preemptive negotiation response may be sent, and if so,
which alternate is the best alternate
o the appropriate response code, either 300 (Multiple Choices) or
or 406 (None Acceptable), when a reactive response is sent.
The algorithm uses the alternate descriptions for each of the
available alternates, as will be included in the Alternates header of
the response, and the Accept headers of the request on the negotiable
resource as input.
In the first step of the preemptive alternate selection algorithm,
the overall quality for every alternate bound to the negotiable
resource is computed. The overall quality is a real number Q in the
range 0 through 1, where 0 is the minimum and 1 the maximum value,
defined as
Q = qs * qe * qc * ql * q * qml
The overall quality values computed in the preemptive algorithm are
not necessarily equal to the overall quality values values computed
in the reactive algorithm of Section 5.1.
The values qs,qe,qc,ql,q,qml for a particular alternate are all
determined using the alternate description of the particular
alternate and the Accept headers of the request.
qs The source quality factor for the alternate is given by the
source-quality attribute in the alternate description.
qe The encoding quality factor is 1 if there is no encoding
attribute in the alternate description. If there is an
encoding attribute in the alternate description, the encoding
quality factor is 1 if no Accept-Encoding header is present
in the request, 1 if an Accept-Encoding header present
indicates the ability to decode the given content encoding,
and 0 otherwise.
qc The charset quality factor is 1 if there is no type attribute
in the alternate description, or if the media type given in
the type attribute of the alternate description does not have
a charset parameter. If there is a charset parameter, then
the charset quality factor is 1 if the charset is US-ASCII, 1
if no Accept-Charset header is present in the request, 1 if
an Accept-Charset header present indicates the ability to
handle the given character set, and 0 otherwise.
ql The language quality factor is 1 if there is no language
attribute in the alternate description. If there is a
language attribute, then the language quality factor is the
highest quality factor assigned by the Accept-Language header
in the request to any one of the languages listed in the
attribute, 0 if none of the listed languages are assigned a
quality factor by the Accept-Language header in the request,
and 1 if there is no Accept-Language header in the request.
q The media type quality factor is 1 if there is no type
attribute in the alternate description. If there is a type
attribute, then the media type quality factor is the quality
factor assigned to the given media type by the Accept headers
in the request, 0 if the Accept headers do not assign a
quality factor to the media type, and 1 if there are no
Accept headers in the request.
qml The maximum length quality factor is 1 if there is no length
attribute or no type attribute in the alternate description.
If there is a length and a type attribute in the alternate
description, then the maximum length quality factor is 0 if
is the "mxb" value assigned to the given media type by the
Accept headers in the request is less than the value given in
the length attribute, 1 if the "mxb" value is greater or
equal, 1 if the Accept headers do not assign an "mxb" value
to the media type, and 1 if there are no Accept headers in
the request.
In the second step of the algorithm, the overall qualities of all
alternates are compared to select the best one. If there is one
alternate with the highest overall quality value, then this is the
best alternate. If there are multiple alternates that share the
highest overall quality value, then the alternate that is listed
first in the Alternates header is the best alternate.
If all alternates have an overall quality value of zero, then any
reactive negotiation response sent must use the 406 (None Acceptable)
response code. Else, any reactive negotiation response sent should
use the 300 (Multiple Choices) response code.
In the third step of the preemptive negotiation alternate selection
algorithm, it is determined whether a preemptive negotiation response
may be sent to return the best alternate found.
If the best alternate has an overall quality value of zero, then the
server must not generate a preemptive response, it should generate a
reactive response with the 406 (None Acceptable) response code.
If the best alternate has an overall quality factor greater than
zero, and no Accept header in the request contains a
reactive-on-wildcard directive, then the server may generate a
preemptive response, provided that the origin server restriction, if
applicable, is met.
If the best alternate has an overall quality factor greater than
zero, and an Accept header in the request contains a
reactive-on-wildcard directive, then the server may generate a
preemptive response, provided that the origin server restriction, if
applicable, is met, if
o the type quality factor (q) of the best alternate was not derived
from a match to a media range containing an asterisk "*" wildcard
character in an Accept header, and
o the language quality factor (ql) of the best alternate was not
derived from a match to a "*" language-range in the
Accept-Language header.
In all other cases, the server must generate a reactive response.
5.3 Caching issues
HTTP/1.1 does not provide a mechanism for conditional GET requests on
negotiable resources, but does provide a mechanism, the Alt-Header
request header, for conditional GET requests on alternate resources.
[## Question to be resolved: _should_ there be a special rule for
conditional GETS on negotiable resources? Some people have said that
they worry about superfluous transmission of long Alternates headers. A
conditional GET could presumably save retransmission of a large
Alternates header. We could define that preemptive and reactive
negotiation responses may omit the Alternates response header if it was
`not modified since'.##]
When generating a 300 (Multiple Options) response, a 406 (None
Acceptable) response, or the Alternates headers for a preemptive
response, a cache may re-use an Alternates header received earlier
from the negotiable resource, as long as the restrictions expressed
by any cache-control directive in the Alternates header are met. If
the presence of an entity is required in a 300 or 406 response,
caches may generate that entity on behalf of the origin server.
When relaying a preemptive response, a cache may infer the request
and response messages of the HTTP transaction on the best alternate
resource performed by the server that generated the preemptive
response, and may update its internal data structures to reflect the
occurrence of this HTTP transaction.
Caches are encouraged to perform such updates because they increase
efficiency and prevent strange (but otherwise allowed) effects if the
contents of an alternate resource are changed at the origin server
while there is still a non-expired version of these contents in
cache.
[##Note: earlier versions of the Alternates header had, besides the
{cache-control ...} directive, a {vary ...} directive. My idea was that
{vary user-agent} in the Alternates header would indicate that the
source quality values in the Alternates header would vary on the
User-Agent field, thus allowing service authors to mix content
negotiation with user agent negotiation. Varying the Alternates header
proved too controversial, so I threw the {vary ...} directive out. This
means (as far as I can see) that _efficient_ negotiation on tables
vs. no tables, which also gives the user the option to select an other
alternate as in normal content negotiation, will only be possible after
we introduce feature negotiation.
The most efficient thing that works in one round trip for the normal
case and that still gives the user the option to select an other
alternate is using
Alternates: {"plan.auto.html" 0.9 {type "text/html"}},
{description "Automatic tables/no tables selection"}},
{"plan.tables.html" 0.8 {type "text/html"}},
{"plan.notables.html" 0.7 {type "text/html"}},
{"plan.dvi" 1.0 {type "text/x-dvi"}}
and making "plan.auto.html" an alternate resource that varies on user
agent. A typical preemptive response would look like
HTTP/1.1 200 OK
Alternates: {"plan.auto.html" 0.9 {type "text/html"}},
{description "Automatic tables/no tables selection"}},
{"plan.tables.html" 0.8 {type "text/html"}},
{"plan.notables.html" 0.7 {type "text/html"}},
Location: plan.auto.html
Vary: user-agent
Content-length: ....
....
[contents of the plan.tables.html file on the server as the entity body]
The problem with this is that it leads to the storage of _four_ entity
bodies (instead of two) in a (full) cache:
1) the variant entity with the tables produced by plan.auto.html,
2) the variant entity without the tables produced by plan.auto.html,
3) the one entity bound to plan.tables.html,
4) the one entity bound to plan.notables.html.
So this doubles the traffic between the proxy and the origin server.
Note that this solution presupposes that the proxy cache can cache
varying resources efficiently, i.e. that we have a Variant-Set like
mechanism for preventing the unnecessary sending of variants already in
cache if a request from a previously unknown user agent is relayed.
Without that, even more traffic between the proxy and the origin server
is needed.
##]
6 Security and Privacy considerations
[##Note: This section could use some editing when it goes into the 1.1
draft. To provide some motivation of changes to the current 1.1 draft,
I am including more text than would be required in an RFC. Also, I have
not had the time to optimize readability of this section.##]
6.1 Spoofing using Location headers
Clients, including caching proxies, may treat the HTTP response that
can be derived from a reactive negotiation response by deleting the
Alternates and Location headers as being controlled by the author of
the best alternate resource, not the author of the negotiable
resource on which the actual request was made. It is the
responsibility of the server to ensure that the best alternate
resource author indeed has this control, because if this control is
lost, control over the responses generated by direct requests on the
best alternate resource is also lost. Origin servers are helped
carrying this responsibility by the rule that clients must reject
preemptive responses that do not satisfy the origin server
restrictions.
This paragraph discusses the implications of the above on server
design and administration. First, it is intended that any negotiable
resource authoring mechanism built into the server, and accessible to
authors of static content and CGI scripts, generates preemptive
responses by internally doing a request on the best variant resource,
and adding the required Alternates and Location headers to the
generated response. Second, it is intended that, if the CGI
interface has a feature that allows script authors to generate a
preemptive response directly, then a) two distrusting parties will
never be able to author CGI scripts in a shared directory, or b) use
of this feature is only enabled for a CGI script if the script author
is trusted by all other authors that use the same directory, or c)
the server filters the Location headers generated by the CGI script
to prevent spoofing that is not prevented by clients applying the
origin server restriction.
6.2 User tracking based on accept headers
If users fine-tune quality factors put into the default user agent
accept headers to the third decimal, these accept headers can be used
as relatively long-lived user identifiers, enabling content providers
(even if they do not provide negotiable resources) to tell apart
different users behind a proxy. This identification allows content
providers to do click-trail tracking, and allows collaborating
content providers to match cross-server click-trails or form
submissions of individual users. Thus, privacy reasons demand that
user agents are conservative in the amount of quality factor fine
tuning they allow to users without giving a warning about privacy and
in the sending of long accept headers by default in a request. (See
also the remarks on sending short accept headers for performance
reasons in Section 5.2).
6.3 Accept headers revealing information of private nature
without real need.
[##Note: Brian Behlendorf has commented that the discussion in two
paragraphs below is way too long for the draft 1.1 standard. I agree, I
made it this long to justify my new Accept-Language: "*" feature.##]
Preferences sent in accept headers, in particular language quality
factors sent in Accept-Language headers, may reveal information that
the user rather keeps private unless it will directly improve the
quality of the service. The content negotiation mechanism allows
users to leave some languages (e.g. languages the knowledge of which
strongly correlates with membership of a particular ethnic group) out
of the Accept-Language header without decreasing the quality of the
negotiation process if the request happens to be on a negotiable
resource. Note however that the speed of the negotiation process may
be affected.
No matter how much information is left out of the Accept headers,
automatic reactive negotiation by a user agent on a negotiable
resource will inevitably reveal some of the user preferences by the
generation of a request on the best alternate resource as partly
determined by the user preferences. Malicious service authors could
provide `fake' negotiable resources, which not even bind to alternate
resources that are in fact different, whose only purpose is to get
information about (ethnicity correlated) languages understood by the
visiting users. Such plots would however be visible to alert
victims, as user agents will allow the user to review a list of all
alternates bound to the negotiable resource.
Maintainers of firewall proxies may want to process outgoing accept
headers to enhance privacy beyond the level provided by the user
agents behind the firewall.
7 Acknowledgments
This document builds on the content negotiation descriptions in [1],
and directly incorporates text from [1] in some places. Many members
of the HTTP working group have contributed to discussions that are
reflected in this document.
8 References
[1] Roy T. Fielding, Henrik Frystyk Nielsen, and Tim Berners-Lee.
Hypertext Transfer Protocol -- HTTP/1.1. Internet-Draft
draft-ietf-http-v11-spec-01.txt, HTTP Working Group, January,
1996.
[2] H. Alvestrand. "Tags for the identification of languages." RFC
1766, UNINETT, March 1995.
Received on Sunday, 25 February 1996 14:46:39 UTC