- From: Koen Holtman <koen@win.tue.nl>
- Date: Sun, 25 Feb 1996 23:41:30 +0100 (MET)
- To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
- Cc: Koen Holtman <koen@win.tue.nl>
New content negotiation sections ================================ Koen Holtman, koen@win.tue.nl version 2, 31 Jan 1996 version 3, 22 Feb 1996 version 4, 25 Feb 1999 0 Purpose of this document This document proposes a content negotiation mechanism for HTTP/1.1. It contains a number of sections that should be read as definitions in the context of the current draft HTTP/1.1 specification [1]. It is intended that these sections are merged into a future version of the draft HTTP/1.1 specification. This document reflects the consensus of the content negotiation subgroup, as I perceive it now. (But note that the content negotiation subgroup also has consensus on some things not covered in this document.) It also contains some elements the content negotiation subgroup has not discussed, or for which only `consensus by the absence of replies' was reached. Issues that still need to be resolved are marked as such. I am posting this document to the entire workgroup so that we can start converging on a version that reflects the consensus of the entire workgroup. Please send comments to the http-wg mailing list. Version 3 of this document was submitted as an internet draft with the name draft-holtman-http-content-negotiation-00.txt. Changes with respect to version 3 are listed below. Earlier versions of this document can be found in the content negotiation subgroup mail archives: <URL:http://www.organic.com/public/conneg/mail/>. 1 Introduction Content negotiation, as proposed in this document, is an optional feature for the HTTP/1.1 protocol: resources may be negotiable, but they need not be. If a resource is negotiable, this changes the semantics of GET and HEAD transactions on the resource. Other transactions are not affected. A negotiable resource has a number of alternates bound to it. The proposed content negotiation mechanism allows for automatic selection of the preferred alternate bound to a negotiable resource based on the properties of the alternates and on the user agent preferences for the retrieval action. This document builds on the content negotiation descriptions in [1], and directly incorporates text from [1] in some places. A new directive, reactive-on-wildcard, is introduced to allow user agents to signal the capability of doing content negotiation. If this directive is absent, the proposed definitions produce server behavior that yields adequate results for (HTTP/1.0) user agents that do not support content negotiation. 2 Terminology and notation This document uses the terminology and notational conventions defined in [1]. It sometimes refers directly to sections in [1], using the notation `Section (1.2[1])'. If a (sub)section title below is marked with (*), is intended as a replacement for the (sub)section with the same title in [1]. All other (sub)sections below, up to Section 7, contain new material intended as an addition to [1]. The text blocks marked with ## signs are comments. Some of them will be removed in later versions of this document, others may be kept until the last version, but should be removed when text is taken from this document and put into a HTTP 1.1 draft. Some of the new response header and field names defined here are very long, I would be happy with alternative names that are shorter, and expect that the eventual 1.1 draft will indeed shorten some names defined here. [## Changes from version 2 to version 3 (internet draft) - Took out the [## ... ##] comments - Added internet draft headers - Re-numbered sections, added introduction - Strengthened the restrictions that partially prevent spoofing using Location headers. - Added rewrites of the Accept-* header sections bases on consensus in the content negotiation sub-wg. - Added q, ql, .. factor computations rules in the preemptive negotiation section. - Changed the word `variant' to `alternate' to eliminate a terminology clash with the Vary header. - Changed the word `virtual' to `derived'. - Split the 1.1-00 URI header into a smaller URI header with only "mirror" and "name", and a new Alternates header. - Changed Variant-If-Modified-Since mechanism into the more general Rep-Header mechanism, to account for If-validator-valid, Variant-Set, Cache-Control, and whatever we come up with next. - Changed the rules on when to include an entity in a 300 or 406 response, simplified the rules on when such responses may be generated. - Rewrote the caching section - Rewrote the security section and added text about privacy issues - Split the 1.1-00 406 (None acceptable) response code into a new 406 (None acceptable) response code and a new 408 (Not acceptable) response code. - Several minor edits ##] [## Changes from version 3 (internet draft) to version 4: - Put back most of the comments from version 2 - Took out internet draft headers - Added more comments - Added rule that proxies may not negotiate based on Alternates headers with attributes they do not understand. - Several minor edits ##] [## Question to be resolved: Should a rudimentary feature negotiation facilities that work for 90% of the cases be added as a stopgap?? I wonder if we won't be doing the web community a disservice if we delay a 90% solution in order to construct a 99% solution for HTTP 1.2. After all, most negotiation that happens now is on tables vs. no tables, not on language or MIME type. ##] 3 Status code definitions 3.1 Redirection 3xx 300 Multiple Choices (*) The requested resource is a negotiable resource and the server is engaging in reactive content negotiation (Section 5). The server has determined that multiple alternates are acceptable, but is not able to determine which alternate is the best alternate. This response may only be generated if specific conditions given in Section 5.2 are met. The response must include an Alternates header describing the alternates bound to the resource, allowing a user agent to automatically select and retrieve an alternate if appropriate. This response is cachable, subject to the restrictions specified in the cache-control directive, if present, of the included Alternates header. If no Accept header in the request contains a reactive-on-wildcard directive, and it was not a HEAD request, the response must include an entity that gives the user the option to select the most appropriate alternate manually. The suggested entity media type as given in the Content-Type response header is "text/html". If there is a reactive-on-wildcard directive, no entity should be included. [## Note: This `no entity should be included' rule is for saving bandwidth. It is expected that clients that add reactive-on-wildcard directives are always able to give the user the option to select the most appropriate alternate manually, using only the Alternates header.##] If the service author finds it appropriate for any user agent that does not implement an alternate selection algorithm to automatically retrieve a certain alternate, then a Location response header giving the URI of that alternate may be included in the response. 9.4 Client Error 4xx 406 None Acceptable (*) The requested resource is a negotiable resource and the server is engaging in reactive content negotiation (Section 5). Usually, this response indicates that the server was not able to positively determine that at least one of the available alternates would be acceptable. The response must include an Alternates header describing the alternates bound to the resource, allowing a user agent to automatically select and retrieve an alternate if appropriate. This response is cachable, subject to the restrictions specified in the cache-control directive, if present, of the included Alternates header. If no Accept header in the request contains a reactive-on-wildcard directive, and it was not a HEAD request, the response must include an entity that gives the user the option to select the most appropriate alternate manually. The suggested entity media type as given in the Content-Type response header is "text/html". If there is a reactive-on-wildcard directive, no entity should be included. If the service author finds it appropriate for any user agent that does not implement an alternate selection algorithm to automatically retrieve a certain alternate, then a Location response header giving the URI of that alternate may be included in the response. 408 Not Acceptable (*) The resource identified by the Request-URI has content characteristics that are not acceptable according to the accept headers sent in the request. This response code must only be generated by un-negotiable resources. 3 Protocol parameter descriptions 3.1 Language Tags (*) [##Note: I deleted the language tag matching discussion that used to be in this Section to Section 10.4 (Accept-Language) No other edits were made.##] A language tag identifies a natural language spoken, written, or otherwise conveyed by human beings for communication of information to other human beings. Computer languages are explicitly excluded. HTTP uses language tags within the Accept-Language, Content-Language, and Alternates fields. The syntax and registry of HTTP language tags is the same as that defined by RFC 1766 [2]. In summary, a language tag is composed of 1 or more parts: A primary language tag and a possibly empty series of subtags: language-tag = primary-tag *( "-" subtag ) primary-tag = 1*8ALPHA subtag = 1*8ALPHA Whitespace is not allowed within the tag and all tags are case-insensitive. The namespace of language tags is administered by the IANA. Example tags include: en, en-US, en-cockney, i-cherokee, x-pig-latin where any two-letter primary-tag is an ISO 639 language abbreviation and any two-letter initial subtag is an ISO 3166 country code. 4 Header field definitions 4.1 Accept (*) [## Note: I did a rewrite of this section, which also involved deleting some remarks about things that are better said in Section 5.##] The Accept request-header field can be used to specify certain media types which are acceptable for the response. Accept headers can be used to guide content negotiation (Section 5), and can also be used to indicate that the request is specifically limited to a small set of desired types, as in the case of a request for an in-line image. In general, it is not efficient to send long Accept headers in every request. See Section 5.2 for a discussion of Accept header efficiency considerations. The field may be folded onto several lines and more than one occurrence of the field is allowed, with the semantics being the same as if all the entries had been in one field value. Accept = "Accept" ":" #( ( media-range [ ";" "q" "=" qvalue ] [ ";" "mxb" "=" 1*DIGIT ] ) | reactive-on-wildcard ) media-range = ( "*/*" | ( type "/" "*" ) | ( type "/" subtype ) ) *( ";" parameter ) reactive-on-wildcard = "reactive-on-wildcard" | "r-o-w" The asterisk "*" character is used to group media types into ranges, with "*/*" indicating all media types and "type/*" indicating all subtypes of that type. The parameter q is used to indicate the media type quality factor, which represents the user's preference for that range of media types. The parameter mxb gives the maximum acceptable size of the Entity-Body, in decimal number of octets, for that range of media types. The default values are: q=1 and mxb=undefined (i.e., infinity). Section 5 describes the content negotiation algorithm which makes use of these values. The example Accept: audio/*; q=0.2, audio/basic should be interpreted as "I prefer audio/basic, but send me any audio type if it is the best available after an 80% mark-down in quality." If no Accept header is present, then it is assumed that the client accepts all media types. If Accept headers are present, and if the resource is an un-negotiable resource which cannot generate a response which is acceptable according to the Accept headers, then the server should generate an error response with the 408 (not acceptable) status code. A more elaborate example is Accept: text/plain; q=0.5, text/html, text/x-dvi; q=0.8; mxb=100000, text/x-c Verbally, this would be interpreted as "text/html and text/x-c are the preferred media types, but if they do not exist, then send the text/x-dvi entity if it is less than 100000 bytes, otherwise send the text/plain entity." Media ranges can be overridden by more specific media ranges or specific media types. If more than one media range applies to a given type, the most specific reference has precedence. For example, Accept: text/*, text/html, text/html;version=2.0, */* have the following precedence: 1) text/html;version=2.0 2) text/html 3) text/* 4) */* The media type quality factor and maximum acceptable size associated with a given type are determined by finding the media range with the highest precedence which matches that type. For example, Accept: text/*;q=0.3, text/html;q=0.7, text/html;version=2.0, */*;q=0.5 would cause the following type quality factors to be associated: text/html;version=2.0 = 1 text/html = 0.7 text/plain = 0.3 image/jpeg = 0.5 text/html;level=3 = 0.7 The inclusion of a reactive-on-wildcard directive in an Accept header will change the rules for the sending of reactive negotiation responses (Section 5). The example Accept: text/html; */*;q=0.95, r-o-w should be interpreted as "text/html is my preferred media type, and I assign media type quality factors in the range 0 - 0.95 to all other media types. Send me a reactive negotiation response, so that I can pick the best alternate myself, if you have any non-text/html alternate which might give me a higher overall quality than any text/html alternate." Note: A user agent may be provided with a default set of quality values for certain media ranges. However, unless the user agent is a closed system which cannot interact with other rendering agents, this default set should be configurable by the user. 4.2 Accept-Charset (*) The Accept-Charset request-header field can be used to indicate what character sets are acceptable for the response. This field allows clients capable of understanding more comprehensive or special-purpose character sets to signal that capability to a server which is capable of representing documents in those character sets. The US-ASCII character set can be assumed to be acceptable to all user agents. Accept-Charset = "Accept-Charset" ":" 1#charset Character set values are described in Section (3.4[1]). An example is Accept-Charset: iso-8859-1, unicode-1-1 If no Accept-Charset header is present, the default is that any character set is acceptable. If an Accept-Charset header is present, and if the resource is an un-negotiable resource which cannot generate a response which is acceptable according to the Accept-Charset header, then the server should generate an error response with the 408 (not acceptable) status code. 4.3 Accept-Encoding (*) The Accept-Encoding request-header field is similar to Accept, but restricts the content-coding values (Section (3.5[1])) which are acceptable in the response. Accept-Encoding = "Accept-Encoding" ":" #( content-coding ) An example of its use is Accept-Encoding: compress, gzip If no Accept-Encoding header is present in a request, the server may assume that the client will accept any content coding. If an Accept-Encoding header is present, and if the resource is an un-negotiable resource which cannot generate a response which is acceptable according to the Accept-Encoding header, then the server should generate an error response with the 408 (not acceptable) status code. 4.4 Accept-Language (*) The Accept-Language request-header field is similar to Accept, but restricts the set of natural languages that are preferred as a response to the request. Accept-Language = "Accept-Language" ":" 1#( language-range [ ";" "q" "=" qvalue ] ) language-range = ( ( 1*8ALPHA *( "-" 1*8ALPHA ) ) | "*" ) Each language-range may be given an associated quality value which represents an estimate of the user's comprehension of the languages specified by that range. The quality value defaults to "q=1" (100% comprehension). This value may be used in the server's content negotiation algorithm (Section 5). For example, Accept-Language: da, en-gb;q=0.8, en;q=0.7 would mean: "I prefer Danish, but will accept British English (with 80% comprehension) and other types of English (with 70% comprehension)." A language-range matches a language-tag if it exactly equals the tag, or if it is a prefix of the tag such that the first tag character following the prefix is "-". The special range "*", if present in the Accept-Language field, matches every tag not matched by any other ranges present in the Accept-Language field. Note: This use of a prefix matching rule does not imply that language tags are assigned to languages in such a way that it is always true that if a user understands a language with a certain tag, then this user will also understand all languages with tags for which this tag is a prefix. The prefix rule simply allows the use of prefix tags if this is the case. The language quality factor assigned to a language-tag by the Accept-Language field is the quality value of the longest language-range in the field that matches the language-tag. If no language-range in the field matches the tag, the language quality factor assigned is 0. If no Accept-Language header is present in a request, the server should assume that all languages are equally acceptable. If an Accept-Language header is present, then all languages which are assigned a quality factor greater than 0 are acceptable. If the resource is an un-negotiable resource which cannot generate a response for an audience capable of understanding at least one acceptable language, it is acceptable to serve a response that uses other languages. It may be contrary to be privacy expectations of the user to send an Accept-Language header with the complete linguistic preferences of the user in every request. For a complete discussion of this issue, see Section 6.3. If a reactive-on-wildcard directive is present in an Accept header, the user agent can safely omit certain languages intelligible to the user from the Accept-Language header, without affecting the quality of the negotiation process in requests on negotiated resources, if the language-range "*" is included with an appropriate language quality factor, Note: As intelligibility is highly dependent on the individual user, it is recommended that client applications make the choice of linguistic preference available to the user. If the choice is not made available, then the Accept-Language header field must not be given in the request. [#### Issue to be resolved: the 1.1-00 spec has a sentence in this section that says: "If the server cannot fulfill the request with one or more of the languages given, or if the languages only represent a subset of a ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ multi-linguistic Entity-Body, [....]" ^^^^^^^^^^^^^^^^ According to this sentence, an entity body can use multiple languages, all of which need to be understood by the sender of the Accept-Language header, so the document would in fact be for a multi-linguistic audience. But in Section 10.11 (Content-Language) the 1.1-00 spec states: Multiple languages may be listed for content that is intended for multiple audiences. For example, a rendition of the "Treaty of ^^^^^^^^^^^^^^^^^^ Waitangi," presented simultaneously in the original Maori and English versions, would call for Content-Language: mi, en However, just because multiple languages are present within an entity does not mean that it is intended for multiple linguistic ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ audiences. An example would be a beginner's language primer, such ^^^^^^^^^ as "A First Lesson in Latin," which is clearly intended to be used by an English-literate audience. In this case, the Content-Language should only include "en". There seems to be an internal contradiction here, the text above states that content can never be designated as being for a multi-linguistic audience, it can only be designated as being for multiple linguistic-audiences. So should HTTP use "multi-linguistic audiences" or "multiple linguistic-audiences"? In this Accept-Language section, I use "multiple linguistic-audiences". ####] 4.5 URI (*) The URI entity-header field is used to inform the recipient of other Uniform Resource Identifiers (Section (3.2[1])) by which the resource can be identified. URI-header = "URI" ":" 1#( uri-mirror | uri-name ) uri-mirror = "{" "mirror" <"> URI <"> "}" uri-name = "{" "name" <"> URI <"> "}" Any URI specified in this field can be absolute or relative to the Request-URI. The "mirror" form of URI refers to a location which is a mirror copy of the Request-URI. The "name" form refers to a location-independent name corresponding to the Request-URI. [## Side issue: I find that the "mirror" and "name" descriptions above do not give enough information to let me know what they are supposed to mean. I understand that the semantics come from current practice in the CERN server. Anyone care to expand these descriptions?##] 4.6 Alternates The Alternates entity-header field is used to describe the alternate resources bound to a negotiable resource. Alternates = "Alternates" ":" 1#( alternate-descr | caching-directive ) alternate-descr = "{" <"> URI <"> source-quality [ "{" "type" <"> media-type <"> "}" ] [ "{" "language" <"> 1#language-tag <"> "}" ] [ "{" "encoding" <"> 1#content-coding <"> "}" ] [ "{" "length" 1*DIGIT "}" ] [ "{" "description" quoted-string "}" ] [ extension-attribute ] "}" source-quality = qvalue extension-attribute = "{" extension-name extension-value "}" extension-name = token extension-value = #( token | quoted-string | <any element of tspecials except "}"> ) Note: the extension-attribute is included because it is expected that HTTP/1.2 will define new attributes for use in the Alternates header. Also, this attribute eases content negotiation experiments under HTTP/1.1. caching-directive = "{" "cache-control" 1#cache-directive "}" Cache-directives are defined in Section (10.8[1]). [##Issue to be resolved: Would just having the max-age cache-directive here be sufficient?##] [##Note: If Age: goes into HTTP/1.1 for caching of normal responses, we need to add optional age field to the URI header##] Any URI specified in this field can be absolute or relative to the Request-URI. For each of the alternates bound to the negotiable resource, the alternates header must include an alternate-descr form describing that alternate. [##Note: If the resource author cannot or does not want to list all the alternates, Vary header based negotiation can be used##] [## Question to be resolved: should text below up to the example be moved to Section (3.9[1]) (Quality Values)?##] The source-quality attribute given in an alternate description is measured by the content provider as representing the amount of degradation from the original source. For example, a picture originally in JPEG form would have a lower source quality when translated to the XBM format, and much lower source quality when translated to an ASCII-art alternate. Note, however, that this is a function of the source -- an original piece of ASCII-art may degrade in quality if it is captured in JPEG form. Content providers should use the following table as a guide when assigning source quality values: 1.000 no degradation 0.999-0.900 no noticeable degradation 0.899-0.700 noticeable, but acceptable degradation 0.699-0.500 barely acceptable degradation 0.499-0.000 unacceptable degradation [##Question to be resolved: can we come up with a word other than `degradation' that also covers the case of alternates not converted from one source?##] It is important that content providers do not assign very low source quality values without good reason, as this will limit the ability of users to influence the negotiation process with their own preference settings. If alternates are not converted from one source, but constructed separately to represent the same abstract information in different ways, then the source quality attributes can be used to express differences in quality between the alternates. An example Alternates header for a negotiable resource with the URI http://www.w3.org/pub/WWW/TheProject is: Alternates: {"TheProject.fr.html" 1.0 {type "text/html"} {language "fr"}}, {"TheProject.en.html" 1.0 {type "text/html"} {language "en"}}, {"TheProject.fr.txt" 0.7 {type "text/plain"} {language "fr"}}, {"TheProject.en.txt" 0.8 {type "text/plain"} {language "en"}} which indicates that the negotiable resource binds to four alternate resources that differ in media type and natural language. The type, language, encoding, and length attributes of an alternate description refer to their Content-* header counterparts. Though all attributes are optional, it is often desirable to include as many attributes as possible as this will increase the quality of the negotiation process. Servers must only generate extension-attributes whose names start with "x-". Clients should ignore all extension attributes they do not recognize. Proxies should not engage in alternate selection calculations on behalf of the origin server if an unrecognized attribute is present in the Alternates header. The description attribute is meant to provide a textual description of some properties of the alternate, to be displayed by a user agent when showing the list of all alternates bound to a negotiable resource (see Section 5). This attribute can be included if the URI and normal attributes of an alternate are considered too opaque to allow interpretation by the user. The cache-control directive of the Alternates header field can be used to restrict the cachability of the Alternates header, and, for 300 (multiple choices) and 406 (none acceptable) responses, the other parts of the response. This directives duplicates the control functionality offered for un-negotiated resources by the Cache-Control header. [## Issue to be resolved: Should there be a {"user-agent-prefix" quoted-string} attribute which could be used for user agent negotiation? The matching rule could amount to: if you match a user-agent-prefix in an alternate, exclude all other alternates with user-agent prefix attributes that provide no, or shorter, matches from consideration. Example: Alternates: {"plan.html" 0.9 {type "text/html"} {user-agent-prefix ""}}, {"plan.wuxta.html" 0.6 {type "text/html"} {user-agent-prefix "WuxtaWeb1."} {description "Does not trigger bug in WuxtaWeb 1.x"}}, {"plan.dvi" 1.0 {type "text/x-dvi"}}, {cache-control max-age=1209600} #] [## Note: adding feature negotiation would add a "feature" attribute in the alternates-descr syntax, and a corresponding Accept-feature request header. The attribute would contain feature identifiers, which are short codes for things like `user agent supports HTML 3.0 tables', `user agent supports java', and maybe the negations of feature identifiers. ##] 4.7 Alt-Header The Alt-Header request-header can be used in requests to negotiable resources to introduce new request headers in any derived requests on alternate resources (see section 5.2). Alt-Header = "Alt-Header" ":" <"> URI <"> Request-Header The URI specified in this field can be absolute or relative to the Request-URI. A typical example is Alt-Header: "TheProject.en.html" If-Validator-Valid: 6a7bf If it already has a copy of the "TheProject.en.html" alternate in cache, a caching client can include this header in requests to allow the server to shorten a 200 (OK) preemptive negotiation response to a 304 (not Modified) response in case that preemptive negotiation yields "TheProject.en.html" as the best alternate. Servers are always allowed to ignore Alt-Header request headers. [##Note: Roy Fielding has proposed a Content-ID response header which would carry validators guaranteed to be 1) different for different resources and 2) different for different resource versions. If we have such a header, and it is generally used, then we can simplify Rep-Header to Unless-ID = "Unless-ID" ":" 1#cid with the meaning: send me a normal response unless the Content-ID would be one of the listed Content-IDs. If the Content-ID is one of the listed ones, return a 4xx (Unless true) response instead. The same Unless-ID would also serve as a simplification of the "If-Validator-Valid" and "Variant-Set" proposed in the caching subgroup. ##] 5 Content negotiation (*) Content negotiation is an optional feature of the HTTP/1.1 protocol: resources may be negotiable, but they need not be. If a resource is negotiable, this changes the semantics of GET and HEAD transactions on the resource. Other transactions are not affected. A negotiable resource has a number of alternates bound to it. The HTTP content negotiation mechanism allows for automatic selection of the preferred alternate bound to a negotiable resource based on the properties of the alternates and on the user agent preferences for the retrieval action on the negotiated resource. [## Note: `retrieval action' is a new term I had to introduce because `request' is not entirely accurate here: with reactive negotiation, one retrieval action causes two requests. ##] An alternate is a resource, identified by an alternate URI, that provides one possible representation of the `contents' of the negotiable resource. An alternate resource must never be a negotiable resource itself. It is the responsibility of the author of the negotiable resource, not the author of the alternate, to ensure that this restriction is not violated. The negotiability of a resource is expressed by the Alternates response header. If a 2xx or 3xx class response does not include an Alternates response header, then the resource is un-negotiable. If any response does include an Alternates response header, then the resource is negotiable. When displaying an alternate as the end result of a retrieval action on a negotiable resource, a user agent should allow the user to review a list of all alternates bound to the negotiable resource, and to initiate retrieval of another alternate if desired. The list can be annotated with some or all of the properties of the alternates, as given by the Alternates header in the negotiable resource response. When displaying an alternate as the end result of a retrieval action on a negotiable resource, a user agent should show the negotiable resource URI, not the alternate resource URI, as being the URI the contents of which were retrieved. If the user agent stores a reference to the content displayed for future use, it is the the negotiable resource URI, not the alternate resource URI, which should be stored. HTTP/1.1 provides for two types of content negotiation: preemptive and reactive. Preemptive negotiation is generally faster than reactive negotiation, but it can only be used if sufficient information about user agent capabilities and user preferences is present in the request on the negotiable resource. Reactive negotiation can always be used. Therefore, preemptive negotiation is best seen as mechanism that can sometimes optimize on reactive negotiation transactions. 5.1 Reactive negotiation In reactive negotiation, the selection and retrieval of an alternate bound to the negotiable resource spans two transactions. In the first transaction, the client transmits a request on the negotiable resource URI, and the server responds with a 300 (multiple choices) or 406 (none acceptable) response, which includes an alternates header describing the alternates bound to the negotiable resource. A 406 response may always be generated, a 300 response may only be generated if specific conditions given in Section 5.2 are met. The client can use the Alternates header in the 300 or 406 response to select the alternate that matches best to the preferences for the retrieval action. In the second transaction, the user agent transmits a request on the URI of the selected alternate resource, and the server will typically respond with a 200 (OK) response, though other response codes like 302 (moved temporarily) are also possible. Only the user agent needs to know that the second request is part of a reactive negotiation process, all other parties can treat it as a normal request on an un-negotiated resource. User agents should use the reactive alternate selection algorithm below when automatically selecting the best alternate listed in an alternates header. User agents are allowed to use other selection algorithms, but this is not recommended, as preemptive negotiation is defined to optimize the case in which the reactive alternate selection algorithm below is used. User agents that do not wish to implement an alternate selection algorithm can, by only using Accept request headers of a certain form, force servers to always include an entity when a reactive negotiation response is sent. They can then use this entity to allow the user to select an alternate manually, or use the reactive response Location header, if present, to automatically fetch the alternate recommended by the server. [##Note: the possibility of doing the above is also important for proxies that want to mediate between a 1.0 client and a 1.1 server. 1.0 clients will always use Accept headers of the certain form that triggers a response suitable for a client which does not implement negotiation.##] In the first step of the reactive alternate selection algorithm, the overall quality for every alternate listed in the Alternates header of the negotiable resource is computed. The overall quality of an alternate is a real number Q in the range 0 through 1, where 0 is the minimum and 1 the maximum value, defined as Q = qs * qe * qc * ql * q * qml The values qs,qe,qc,ql,q,qml for a particular alternate are all determined using the part of the received Alternates header describing that alternate, called the alternate description below. qs The source quality factor for the alternate is given by the source-quality attribute in the alternate description. qe The encoding quality factor is 1 if there is no encoding attribute in the alternate description. If there is an encoding attribute in the alternate description, the encoding quality factor is 1 if the user agent can decode the given content encoding, 0 otherwise. [##Question to be resolved: do we really want to distinguish between alternates that have an encoding and alternates that do not? This could block a smooth transition to a scheme in which servers apply compression on the fly if the client indicates it can handle decompression. Maybe negotiation about en/decoding capabilities should be kept separate from the main content negotiation mechanism. On the other hand, the Transfer-Encoding header already seems to allow for a future introduction of on the fly compression##] qc The charset quality factor is 1 if there is no type attribute in the alternate description, or if the media type given in the type attribute of the alternate description does not have a charset parameter. If there is a charset parameter, then the charset quality factor is 1 if the user agent can process a message with the given character set, 0 otherwise. User agents must always be able to process a message with the US-ASCII charset. [## Question to be resolved: do recent discussions on the http-wg list indicate that the US-ASCII above should be changed into ISO-8859-1? Or should the text above be changed to say `US-ASCII or ISO-8859-1'? I believe the consensus was 'no'.##] ql The language quality factor is 1 if there is no language attribute in the alternate description. If there is a language attribute, then the language quality factor is the highest quality factor assigned to any one of the listed languages according to the user agent language preferences for the retrieval action. [## Note: the 1.1-01 draft says: `If at least one alternate has an assigned content language, but the one currently under consideration does not, then it should be assigned the value "ql=0.5".' I deleted this requirement, service authors can more accurately use the qs attribute to adjust things in situations where only some of the alternates have languages##] q The media type quality factor is 1 if there is no type attribute in the alternate description. If there is a type attribute, then the media type quality factor is the quality factor assigned to the given media type in in the user agent media type preferences for the retrieval action. qml The maximum length quality factor is 1 if there is no length attribute in the alternate description. If there is a length attribute in the alternate description, then the maximum length quality factor is 1 if the length given is less than or equal to the maximum acceptable length according to the user agent maximum length preferences for the retrieval action, 0 otherwise. Preferred maximum lengths are often equal to `infinity'. In the second step of the the reactive alternate selection algorithm, the overall qualities of all alternates are compared to select the best alternate. If there is one alternate with the highest overall quality value, then that alternate is the best alternate. If there are multiple alternates that share the highest overall quality value, then the alternate that is listed first in the received Alternates header is the best alternate. If all alternates have an overall quality value of zero, a user agent should not automatically retrieve the first alternate, but stop the reactive negotiation process, allowing the user to decide on the next action. 5.2 Preemptive negotiation (*) In preemptive negotiation, the selection and retrieval of an alternate bound to the negotiable resource is done in a single transaction, saving one round trip time over reactive negotiation. A preemptive negotiation response must only be generated by a server if the request on the negotiable resource contains enough information about user agent capabilities and user preferences to allow the server to determine which alternate would be chosen if the reactive alternate selection algorithm outlined above were used by the user agent in reactive negotiation. When engaging in preemptive negotiation, the server must use the following algorithm, or any other algorithm that produces the same result, to construct the preemptive response message. 1. Construct a request message on the best alternate resource by modifying the received request message on the negotiable resource in the following way. First, the Request-URI and the Host request header must be rewritten to point to the best alternate resource. Then, if there are any Alt-Header request headers that match the best alternate resource URI, the headers given in these matching Alt-Header request headers may be added to the headers in the request message. Finally, the Alt-Header request headers in the request message may be removed. 2. Generate a valid HTTP response message for the request message constructed in step 1. If the server is a proxy, this may involve sending the constructed request to the origin server. 3. Add two headers to the HTTP response message generated in step 2. These are an Alternates header describing the alternates bound to the negotiable resource, and a Location header that gives the URI of the best alternate resource. A preemptive response message satisfies the origin server restriction if and only if the full URI of the best alternate resource can be obtained by adding a sequence of characters excluding "/" to the end of the full URI of the negotiable resource, where the first character added may not be an US-ASCII uppercase or lowercase letter. [##Note: In version 2 of this text, the origin server restriction was much weaker: it only said that the two URIs must be located on the same server. I have changed this because a stronger restriction will make the implementation and maintenance of origin servers simpler, while not making life much more difficult for the authors of negotiable resources.##] [##Question to be resolved: should the origin server restriction be weakened? Daniel DuBois proposes "The URLs must match up to the last slash in the negotiable resource".##] Origin servers should not generate a preemptive response message that violates the origin server restriction. If a client receives a preemptive response message that violates the origin server restriction directly from an origin server, then that client must reject the message as a probable spoofing attempt. If the client is a proxy, it must not pass on the response, it can pass on a 502 (bad gateway) response instead. Servers acting as proxies may generate preemptive responses that do violate the origin server restriction, and clients should not reject these responses. [##Note: the origin server restriction does not imply that you can't have alternates on other servers. You can: you just have to generate reactive negotiation responses for those variants.##] Clients, including caching proxies, may treat the HTTP response that can be derived from a reactive negotiation response by deleting the Alternates and Location headers as being controlled by the author of the best alternate resource, not the author of the negotiable resource on which the actual request was made. It is the responsibility of the server to ensure that the best alternate resource author indeed has this control. Section 6.1 discusses the implications of this rule on server design and administration. User agents can transmit information about their capabilities and preferences for a retrieval action using the various accept request headers. If the accept headers present in a request on a negotiable resource contain enough information, a server may be able to generate a preemptive negotiation response. As most resources will be un-negotiable, user agents are encouraged to send empty or small accept headers, or even omit some accept headers entirely, by default. If a user agent knows or discovers that an origin server provides negotiated resources, it is encouraged to use data from the negotiated responses received so far to dynamically add or extend accept headers sent in future requests on resources provided by that origin server, in order to increase the probability that preemptive negotiation can be used instead of the slower reactive negotiation. Servers that want to support preemptive negotiation must use the preemptive alternate selection algorithm below. This algorithm can be applied to determine o whether a preemptive negotiation response may be sent, and if so, which alternate is the best alternate o the appropriate response code, either 300 (Multiple Choices) or or 406 (None Acceptable), when a reactive response is sent. The algorithm uses the alternate descriptions for each of the available alternates, as will be included in the Alternates header of the response, and the Accept headers of the request on the negotiable resource as input. In the first step of the preemptive alternate selection algorithm, the overall quality for every alternate bound to the negotiable resource is computed. The overall quality is a real number Q in the range 0 through 1, where 0 is the minimum and 1 the maximum value, defined as Q = qs * qe * qc * ql * q * qml The overall quality values computed in the preemptive algorithm are not necessarily equal to the overall quality values values computed in the reactive algorithm of Section 5.1. The values qs,qe,qc,ql,q,qml for a particular alternate are all determined using the alternate description of the particular alternate and the Accept headers of the request. qs The source quality factor for the alternate is given by the source-quality attribute in the alternate description. qe The encoding quality factor is 1 if there is no encoding attribute in the alternate description. If there is an encoding attribute in the alternate description, the encoding quality factor is 1 if no Accept-Encoding header is present in the request, 1 if an Accept-Encoding header present indicates the ability to decode the given content encoding, and 0 otherwise. qc The charset quality factor is 1 if there is no type attribute in the alternate description, or if the media type given in the type attribute of the alternate description does not have a charset parameter. If there is a charset parameter, then the charset quality factor is 1 if the charset is US-ASCII, 1 if no Accept-Charset header is present in the request, 1 if an Accept-Charset header present indicates the ability to handle the given character set, and 0 otherwise. ql The language quality factor is 1 if there is no language attribute in the alternate description. If there is a language attribute, then the language quality factor is the highest quality factor assigned by the Accept-Language header in the request to any one of the languages listed in the attribute, 0 if none of the listed languages are assigned a quality factor by the Accept-Language header in the request, and 1 if there is no Accept-Language header in the request. q The media type quality factor is 1 if there is no type attribute in the alternate description. If there is a type attribute, then the media type quality factor is the quality factor assigned to the given media type by the Accept headers in the request, 0 if the Accept headers do not assign a quality factor to the media type, and 1 if there are no Accept headers in the request. qml The maximum length quality factor is 1 if there is no length attribute or no type attribute in the alternate description. If there is a length and a type attribute in the alternate description, then the maximum length quality factor is 0 if is the "mxb" value assigned to the given media type by the Accept headers in the request is less than the value given in the length attribute, 1 if the "mxb" value is greater or equal, 1 if the Accept headers do not assign an "mxb" value to the media type, and 1 if there are no Accept headers in the request. In the second step of the algorithm, the overall qualities of all alternates are compared to select the best one. If there is one alternate with the highest overall quality value, then this is the best alternate. If there are multiple alternates that share the highest overall quality value, then the alternate that is listed first in the Alternates header is the best alternate. If all alternates have an overall quality value of zero, then any reactive negotiation response sent must use the 406 (None Acceptable) response code. Else, any reactive negotiation response sent should use the 300 (Multiple Choices) response code. In the third step of the preemptive negotiation alternate selection algorithm, it is determined whether a preemptive negotiation response may be sent to return the best alternate found. If the best alternate has an overall quality value of zero, then the server must not generate a preemptive response, it should generate a reactive response with the 406 (None Acceptable) response code. If the best alternate has an overall quality factor greater than zero, and no Accept header in the request contains a reactive-on-wildcard directive, then the server may generate a preemptive response, provided that the origin server restriction, if applicable, is met. If the best alternate has an overall quality factor greater than zero, and an Accept header in the request contains a reactive-on-wildcard directive, then the server may generate a preemptive response, provided that the origin server restriction, if applicable, is met, if o the type quality factor (q) of the best alternate was not derived from a match to a media range containing an asterisk "*" wildcard character in an Accept header, and o the language quality factor (ql) of the best alternate was not derived from a match to a "*" language-range in the Accept-Language header. In all other cases, the server must generate a reactive response. 5.3 Caching issues HTTP/1.1 does not provide a mechanism for conditional GET requests on negotiable resources, but does provide a mechanism, the Alt-Header request header, for conditional GET requests on alternate resources. [## Question to be resolved: _should_ there be a special rule for conditional GETS on negotiable resources? Some people have said that they worry about superfluous transmission of long Alternates headers. A conditional GET could presumably save retransmission of a large Alternates header. We could define that preemptive and reactive negotiation responses may omit the Alternates response header if it was `not modified since'.##] When generating a 300 (Multiple Options) response, a 406 (None Acceptable) response, or the Alternates headers for a preemptive response, a cache may re-use an Alternates header received earlier from the negotiable resource, as long as the restrictions expressed by any cache-control directive in the Alternates header are met. If the presence of an entity is required in a 300 or 406 response, caches may generate that entity on behalf of the origin server. When relaying a preemptive response, a cache may infer the request and response messages of the HTTP transaction on the best alternate resource performed by the server that generated the preemptive response, and may update its internal data structures to reflect the occurrence of this HTTP transaction. Caches are encouraged to perform such updates because they increase efficiency and prevent strange (but otherwise allowed) effects if the contents of an alternate resource are changed at the origin server while there is still a non-expired version of these contents in cache. [##Note: earlier versions of the Alternates header had, besides the {cache-control ...} directive, a {vary ...} directive. My idea was that {vary user-agent} in the Alternates header would indicate that the source quality values in the Alternates header would vary on the User-Agent field, thus allowing service authors to mix content negotiation with user agent negotiation. Varying the Alternates header proved too controversial, so I threw the {vary ...} directive out. This means (as far as I can see) that _efficient_ negotiation on tables vs. no tables, which also gives the user the option to select an other alternate as in normal content negotiation, will only be possible after we introduce feature negotiation. The most efficient thing that works in one round trip for the normal case and that still gives the user the option to select an other alternate is using Alternates: {"plan.auto.html" 0.9 {type "text/html"}}, {description "Automatic tables/no tables selection"}}, {"plan.tables.html" 0.8 {type "text/html"}}, {"plan.notables.html" 0.7 {type "text/html"}}, {"plan.dvi" 1.0 {type "text/x-dvi"}} and making "plan.auto.html" an alternate resource that varies on user agent. A typical preemptive response would look like HTTP/1.1 200 OK Alternates: {"plan.auto.html" 0.9 {type "text/html"}}, {description "Automatic tables/no tables selection"}}, {"plan.tables.html" 0.8 {type "text/html"}}, {"plan.notables.html" 0.7 {type "text/html"}}, Location: plan.auto.html Vary: user-agent Content-length: .... .... [contents of the plan.tables.html file on the server as the entity body] The problem with this is that it leads to the storage of _four_ entity bodies (instead of two) in a (full) cache: 1) the variant entity with the tables produced by plan.auto.html, 2) the variant entity without the tables produced by plan.auto.html, 3) the one entity bound to plan.tables.html, 4) the one entity bound to plan.notables.html. So this doubles the traffic between the proxy and the origin server. Note that this solution presupposes that the proxy cache can cache varying resources efficiently, i.e. that we have a Variant-Set like mechanism for preventing the unnecessary sending of variants already in cache if a request from a previously unknown user agent is relayed. Without that, even more traffic between the proxy and the origin server is needed. ##] 6 Security and Privacy considerations [##Note: This section could use some editing when it goes into the 1.1 draft. To provide some motivation of changes to the current 1.1 draft, I am including more text than would be required in an RFC. Also, I have not had the time to optimize readability of this section.##] 6.1 Spoofing using Location headers Clients, including caching proxies, may treat the HTTP response that can be derived from a reactive negotiation response by deleting the Alternates and Location headers as being controlled by the author of the best alternate resource, not the author of the negotiable resource on which the actual request was made. It is the responsibility of the server to ensure that the best alternate resource author indeed has this control, because if this control is lost, control over the responses generated by direct requests on the best alternate resource is also lost. Origin servers are helped carrying this responsibility by the rule that clients must reject preemptive responses that do not satisfy the origin server restrictions. This paragraph discusses the implications of the above on server design and administration. First, it is intended that any negotiable resource authoring mechanism built into the server, and accessible to authors of static content and CGI scripts, generates preemptive responses by internally doing a request on the best variant resource, and adding the required Alternates and Location headers to the generated response. Second, it is intended that, if the CGI interface has a feature that allows script authors to generate a preemptive response directly, then a) two distrusting parties will never be able to author CGI scripts in a shared directory, or b) use of this feature is only enabled for a CGI script if the script author is trusted by all other authors that use the same directory, or c) the server filters the Location headers generated by the CGI script to prevent spoofing that is not prevented by clients applying the origin server restriction. 6.2 User tracking based on accept headers If users fine-tune quality factors put into the default user agent accept headers to the third decimal, these accept headers can be used as relatively long-lived user identifiers, enabling content providers (even if they do not provide negotiable resources) to tell apart different users behind a proxy. This identification allows content providers to do click-trail tracking, and allows collaborating content providers to match cross-server click-trails or form submissions of individual users. Thus, privacy reasons demand that user agents are conservative in the amount of quality factor fine tuning they allow to users without giving a warning about privacy and in the sending of long accept headers by default in a request. (See also the remarks on sending short accept headers for performance reasons in Section 5.2). 6.3 Accept headers revealing information of private nature without real need. [##Note: Brian Behlendorf has commented that the discussion in two paragraphs below is way too long for the draft 1.1 standard. I agree, I made it this long to justify my new Accept-Language: "*" feature.##] Preferences sent in accept headers, in particular language quality factors sent in Accept-Language headers, may reveal information that the user rather keeps private unless it will directly improve the quality of the service. The content negotiation mechanism allows users to leave some languages (e.g. languages the knowledge of which strongly correlates with membership of a particular ethnic group) out of the Accept-Language header without decreasing the quality of the negotiation process if the request happens to be on a negotiable resource. Note however that the speed of the negotiation process may be affected. No matter how much information is left out of the Accept headers, automatic reactive negotiation by a user agent on a negotiable resource will inevitably reveal some of the user preferences by the generation of a request on the best alternate resource as partly determined by the user preferences. Malicious service authors could provide `fake' negotiable resources, which not even bind to alternate resources that are in fact different, whose only purpose is to get information about (ethnicity correlated) languages understood by the visiting users. Such plots would however be visible to alert victims, as user agents will allow the user to review a list of all alternates bound to the negotiable resource. Maintainers of firewall proxies may want to process outgoing accept headers to enhance privacy beyond the level provided by the user agents behind the firewall. 7 Acknowledgments This document builds on the content negotiation descriptions in [1], and directly incorporates text from [1] in some places. Many members of the HTTP working group have contributed to discussions that are reflected in this document. 8 References [1] Roy T. Fielding, Henrik Frystyk Nielsen, and Tim Berners-Lee. Hypertext Transfer Protocol -- HTTP/1.1. Internet-Draft draft-ietf-http-v11-spec-01.txt, HTTP Working Group, January, 1996. [2] H. Alvestrand. "Tags for the identification of languages." RFC 1766, UNINETT, March 1995.
Received on Sunday, 25 February 1996 14:46:39 UTC