- From: Koen Holtman <koen@win.tue.nl>
- Date: Sat, 13 Apr 1996 20:22:09 +0200 (MET DST)
- To: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
- Cc: Koen Holtman <koen@win.tue.nl>
I am pleased to announce that I believe there to be consensus on the text for the sections: 3.10 Language Tags (9. Status Code Definitions) 416 Not Acceptable 10.1 Accept 10.2 Accept-Charset 10.3 Accept-Encoding 10.4 Accept-Language 14.7 Privacy issues connected to Accept headers The only open issue connected to these sections is the issue on whether the specification should use the term `charset' or `character set'. This closes the following issues on the HTTP/1.1 issues list: QMXB NOTACCEPT LANGUAGETAGS ACCEPT ACCEPTCHARSET ACCEPTENCODING ACCEPTLANGUAGE BOTH ACCEPT-PRIVACY If you believe that there is no consensus on one of these issues, please announce this as soon as possible. Below is the consensus text. The change bars are computer-generated, and indicate changes with respect to the text posted on Friday, April 5. Changed words behind the changebars are typeset in capital letters. See the end of this message for diffs between draft-ietf-http-v11-spec-01.txt and the new consensus text. ===================================================================== 3. Protocol Parameters 3.10 Language Tags [##Note: I moved the language tag matching discussion that used to be in this Section to Section 10.4 (Accept-Language). Some other minor edits were made.##] A language tag identifies a natural language spoken, written, or otherwise conveyed by human beings for communication of information to | other human beings. Computer languages are explicitly excluded. HTTP uses language tags within the Accept-Language and Content-Language fields. The syntax and registry of HTTP language tags is the same as that defined by RFC 1766 [1]. In summary, a language tag is composed of 1 or more parts: A primary language tag and a possibly empty series of subtags: language-tag = primary-tag *( "-" subtag ) primary-tag = 1*8ALPHA subtag = 1*8ALPHA Whitespace is not allowed within the tag and all tags are case-insensitive. The namespace of language tags is administered by the IANA. Example tags include: en, en-US, en-cockney, i-cherokee, x-pig-latin where any two-letter primary-tag is an ISO 639 language abbreviation and any two-letter initial subtag is an ISO 3166 country code. 9. Status Code Definitions | 416 Not Acceptable | [##Note: The previous version had the 413, not 416 above, but | 413 has since been taken by another new error type.##] | [##Note: the new 416 is similar to the 406 response code in the old draft. 406 cannot be used for content negotiation compatibility reasons##] | The resource identified by the Request-URI and Host request header (present if the request-URI is not an absoluteURI) is only capable of generating response entities which have content characteristics not acceptable according to the accept headers sent in the request. HTTP/1.1 servers are allowed to return responses which are not acceptable according to the accept headers sent in the request. In | some cases, this may even be preferable over sending a 416 response. User agents are encouraged to inspect the headers of an | incoming response to determine if it is acceptable. If THE RESPONSE is | not ACCEPTABLE, user agents should interrupt the receipt of the | response if doing so would save network resources. If it IS unknown whether an incoming response would be acceptable, a user agent should temporarily stop receipt of more data and query the user for a decision on further actions. [## Note: the paragraph above could be moved to a more convenient location in the 1.1 document if the editor finds one. Note that the above rule was discussed extensively on the content negotiation mailing list. A short summary of the main reason behind this rule: 20 line HTTP servers.##] 10 Header Field Definitions 10.1 Accept The Accept request-header field can be used to specify certain media types which are acceptable for the response. Accept headers can be used to indicate that the request is specifically limited to a small set of desired types, as in the case of a request for an in-line image. The field may be folded onto several lines and more than one occurrence of the field is allowed, with the semantics being the same as if all the entries had been in one field value. Accept = "Accept" ":" #( | media-range [ ( ":" | ";" ) range-parameter *( ";" range-parameter ) ] | | extension-token ) media-range = ( "*/*" | ( type "/" "*" ) | ( type "/" subtype ) ) *( ";" parameter ) range-parameter = ( "q" "=" qvalue ) | extension-range-parameter extension-range-parameter = ( token "=" token ) extension-token = token The asterisk "*" character is used to group media types into ranges, with "*/*" indicating all media types and "type/*" indicating all subtypes of that type. The range-parameter q is used to indicate the media type quality factor for the range, which represents the user's preference for that range of media types. The default value is q=1. In Accept headers generated by HTTP/1.1 clients, the character separating media-ranges from range-parameters should be a ":". HTTP/1.1 servers should be tolerant of use of the ";" separator by HTTP/1.0 clients. The example | Accept: audio/*: q=0.2, audio/basic should be interpreted as "I prefer audio/basic, but send me any audio type if it is the best available after an 80% mark-down in quality." If no Accept header is present, then it is assumed that the client accepts all media types. If Accept headers are present, and if the | SERVER cannot send a response which is acceptable according to the Accept headers, then the server should send an error response with the | 416 (not acceptable) status code, though the sending of an | UNACCEPTABLE response is also allowed. A more elaborate example is | Accept: text/plain: q=0.5, text/html, | text/x-dvi: q=0.8, text/x-c Verbally, this would be interpreted as "text/html and text/x-c are the preferred media types, but if they do not exist, then send the text/x-dvi entity, and if that does not exist, send the text/plain entity." Media ranges can be overridden by more specific media ranges or specific media types. If more than one media range applies to a given type, the most specific reference has precedence. For example, Accept: text/*, text/html, text/html;level=1, */* have the following precedence: 1) text/html;level=1 2) text/html 3) text/* 4) */* The media type quality factor associated with a given type is determined by finding the media range with the highest precedence which matches that type. For example, Accept: text/*:q=0.3, text/html:q=0.7, text/html;level=1, */*:q=0.5 | would cause the following VALUES to be associated: text/html;level=1 = 1 text/html = 0.7 text/plain = 0.3 image/jpeg = 0.5 text/html;level=3 = 0.7 Note: A user agent may be provided with a default set of quality values for certain media ranges. However, unless the user agent is a closed system which cannot interact with other rendering agents, this default set should be configurable by the user. 10.2 Accept-Charset The Accept-Charset request-header field can be used to indicate what character sets are acceptable for the response. This field allows clients capable of understanding more comprehensive or special-purpose character sets to signal that capability to a server which is capable of representing documents in those character sets. The ISO-8859-1 character set can be assumed to be acceptable to all user agents. Accept-Charset = "Accept-Charset" ":" 1#( charset [ ";" "q" "=" qvalue ] ) Character set values are described in Section 3.4. Each charset may be given an associated quality value which represents the user's preference for that charset. The default value is q=1. An example is Accept-Charset: iso-8859-5, unicode-1-1;q=0.8 If no Accept-Charset header is present, the default is that any character set is acceptable. If an Accept-Charset header is present, | and if the SERVER cannot send a response which is acceptable according to the Accept-Charset header, then the server should send an error | response with the 416 (not acceptable) status code, though the sending | of an UNACCEPTABLE response is also allowed. 10.3 Accept-Encoding The Accept-Encoding request-header field is similar to Accept, but restricts the content-coding values (Section 3.5) which are acceptable in the response. Accept-Encoding = "Accept-Encoding" ":" #( content-coding ) An example of its use is Accept-Encoding: compress, gzip If no Accept-Encoding header is present in a request, the server may assume that the client will accept any content coding. If an | Accept-Encoding header is present, and if the SERVER cannot send a response which is acceptable according to the Accept-Encoding header, | then the server should send an error response with the 416 (not acceptable) status code. 10.4 Accept-Language The Accept-Language request-header field is similar to Accept, but restricts the set of natural languages that are preferred as a response to the request. Accept-Language = "Accept-Language" ":" 1#( language-range [ ";" "q" "=" qvalue ] ) language-range = ( ( 1*8ALPHA *( "-" 1*8ALPHA ) ) | "*" ) Each language-range may be given an associated quality value which represents an estimate of the user's comprehension of the languages specified by that range. The quality value defaults to "q=1" (100% comprehension). For example, Accept-Language: da, en-gb;q=0.8, en;q=0.7 would mean: "I prefer Danish, but will accept British English (with 80% comprehension) and other types of English (with 70% comprehension)." A language-range matches a language-tag if it exactly equals the tag, | or if it EXACTLY EQUALS a prefix (A SUB-SEQUENCE STARTING AT THE FIRST | CHARACTER) of the tag such that the first tag character following the prefix is "-". The special range "*", if present in the Accept-Language field, matches every tag not matched by any other ranges present in the Accept-Language field. Note: This use of a prefix matching rule does not imply that language tags are assigned to languages in such a way that it is always true that if a user understands a language with a certain tag, then this user will also understand all languages with tags for which this tag is a prefix. The prefix rule simply allows the use of prefix tags if this is the case. The language quality factor assigned to a language-tag by the Accept-Language field is the quality value of the longest language-range in the field that matches the language-tag. If no language-range in the field matches the tag, the language quality factor assigned is 0. If no Accept-Language header is present in a request, the server should assume that all languages are equally | acceptable. If an Accept-Language header is present, THEN ALL | LANGUAGES WHICH ARE ASSIGNED A QUALITY FACTOR GREATER THAN 0 are | ACCEPTABLE. IF the SERVER cannot GENERATE a response FOR an audience | capable of understanding at least one ACCEPTABLE LANGUAGE, it CAN send a response that uses one or more un-accepted languages. It may be contrary to be privacy expectations of the user to send an Accept-Language header with the complete linguistic preferences of the user in every request. For a discussion of this issue, see Section | 14.7. Note: As intelligibility is highly dependent on the individual user, it is recommended that client applications make the choice of linguistic preference available to the user. If the choice is not made available, then the Accept-Language header field must not be given in the request. 14 Security Considerations | 14.7 Privacy issues connected to Accept headers [## Note: I believe someone else (Brian Behlendorf?) was also writing text about this, so I only include some concerns about Accept-Language important from a European viewpoint. The concern of user tracking through Accept headers is not covered below, see Section 6.2 of draft-holtman for a discussion of this concern##] | [## Note: update in the above note: Brian Behlendorf does not seem | to be responding, so I will take over writing text about user | tracking.##] Accept request headers can reveal information about the user to all servers which are accessed. The Accept-Language header in particular can reveal information the user would consider to be of a private nature, because the understanding of particular languages is often strongly correlated to the membership of a particular ethnic group. User agents which offer the option to configure the contents of an Accept-Language header to be sent in every request are strongly encouraged to let the configuration process include a message which makes the user aware of the loss of privacy involved. An approach that limits the loss of privacy would be for a user agent to omit the sending of Accept-Language headers by default, and to ask | the user whether it should start sending Accept-Language headers to a | server if it detects, by looking for any Vary or Alternates response headers generated by the server, that such sending could improve the quality of service. ===================================================================== Below is a diff listing between draft-ietf-http-v11-spec-01.txt and the new consensus text. Lines preceded by - were in draft-ietf-http-v11-spec-01.txt. Lines preceded by + are the new consensus wording. The diff listing below was computer generated and edited by hand to improve readability. ===================================================================== 3. Protocol Parameters 3.10 Language Tags A language tag identifies a natural language spoken, written, or otherwise conveyed by human beings for communication of information to other human beings. Computer languages are explicitly excluded. HTTP -uses language tags within the Accept-Language, Content-Language, and -URI-header fields. +uses language tags within the Accept-Language and Content-Language +fields. The syntax and registry of HTTP language tags is the same as that defined by RFC 1766 [1]. In summary, a language tag is composed of 1 or more parts: A primary language tag and a possibly empty series of subtags: language-tag = primary-tag *( "-" subtag ) primary-tag = 1*8ALPHA subtag = 1*8ALPHA Whitespace is not allowed within the tag and all tags are case-insensitive. The namespace of language tags is administered by the IANA. Example tags include: en, en-US, en-cockney, i-cherokee, x-pig-latin where any two-letter primary-tag is an ISO 639 language abbreviation and any two-letter initial subtag is an ISO 3166 country code. -In the context of the Accept-Language header (Section 10.4), a -language tag is not to be interpreted as a single token, as per RFC -1766, but as a hierarchy. A server should consider that it has a match -when a language tag received in an Accept-Language header matches the -initial portion of the language tag of a document. An exact match -should be preferred. This interpretation allows a browser to send, for -example: - - Accept-Language: en-US, en; ql=0.95 - -when the intent is to access, in order of preference, documents in -US-English ("en-US"), 'plain' or 'international' English ("en"), and -any other variant of English (initial "en-"). - - Note: Using the language tag as a hierarchy does not imply - that all languages with a common prefix will be understood - by those fluent in one or more of those languages; it simply - allows the user to request this commonality when it is true - for that user. 9. Status Code Definitions -406 None Acceptable - -The server has found a resource matching the Request-URI, but not one -that satisfies the conditions identified by the Accept and -Accept-Encoding request headers. Unless it was a HEAD request, the -response should include an entity containing a list of resource -characteristics and locations from which the user or user agent can -choose the one most appropriate. The entity format is specified by the -media type given in the Content-Type header field. Depending upon the -format and the capabilities of the user agent, selection of the most -appropriate choice may be performed automatically. +416 Not Acceptable + +The resource identified by the Request-URI and Host request header +(present if the request-URI is not an absoluteURI) is only capable of +generating response entities which have content characteristics not +acceptable according to the accept headers sent in the request. + +HTTP/1.1 servers are allowed to return responses which are not +acceptable according to the accept headers sent in the request. In +some cases, this may even be preferable over sending a 416 +response. User agents are encouraged to inspect the headers of an +incoming response to determine if it is acceptable. If the response is +not acceptable, user agents should interrupt the receipt of the +response if doing so would save network resources. If it is unknown +whether an incoming response would be acceptable, a user agent should +temporarily stop receipt of more data and query the user for a +decision on further actions. 10. Header Field Definitions 10.1 Accept -The Accept response-header field can be used to indicate a list of -media ranges which are acceptable as a response to the request. The -asterisk "*" character is used to group media types into ranges, with -"*/*" indicating all media types and "type/*" indicating all subtypes -of that type. The set of ranges given by the client should represent -what types are acceptable given the context of the request. The Accept -field should only be used when the request is specifically limited to -a set of desired types, as in the case of a request for an in-line -image, or to indicate qualitative preferences for specific media -types. +The Accept request-header field can be used to specify certain media +types which are acceptable for the response. Accept headers can be +used to indicate that the request is specifically limited to a small +set of desired types, as in the case of a request for an in-line +image. The field may be folded onto several lines and more than one occurrence of the field is allowed, with the semantics being the same as if all the entries had been in one field value. Accept = "Accept" ":" #( media-range - [ ";" "q" "=" qvalue ] - [ ";" "mxb" "=" 1*DIGIT ] ) + [ ( ":" | ";" ) + range-parameter + *( ";" range-parameter ) ] + | extension-token ) media-range = ( "*/*" | ( type "/" "*" ) | ( type "/" subtype ) ) *( ";" parameter ) + range-parameter = ( "q" "=" qvalue ) + | extension-range-parameter + extension-range-parameter = ( token "=" token ) + extension-token = token + -The parameter q is used to indicate the quality factor, which -represents the user's preference for that range of media types. The -parameter mxb gives the maximum acceptable size of the Entity-Body, in -decimal number of octets, for that range of media types. Section 12 -describes the content negotiation algorithm which makes use of these -values. The default values are: q=1 and mxb=undefined (i.e., -infinity). +The asterisk "*" character is used to group media types into ranges, +with "*/*" indicating all media types and "type/*" indicating all +subtypes of that type. The range-parameter q is used to indicate the +media type quality factor for the range, which represents the user's +preference for that range of media types. The default value is q=1. In +Accept headers generated by HTTP/1.1 clients, the character separating +media-ranges from range-parameters should be a ":". HTTP/1.1 servers +should be tolerant of use of the ";" separator by HTTP/1.0 clients. The example - Accept: audio/*; q=0.2, audio/basic + Accept: audio/*: q=0.2, audio/basic should be interpreted as "I prefer audio/basic, but send me any audio type if it is the best available after an 80% mark-down in quality." -If no Accept header is present, then it is assumed that the client -accepts all media types with quality factor 1. This is equivalent to -the client sending the following accept header field: - - Accept: */*; q=1 - -or - - Accept: */* - -If a single Accept header is provided and it contains no field value, -then the server must interpret it as a request to not perform any -preemptive content negotiation (Section 12) and instead return a 406 -(none acceptable) response if there are variants available for the -Request-URI. +If no Accept header is present, then it is assumed that the client +accepts all media types. If Accept headers are present, and if the +server cannot send a response which is acceptable according to the +Accept headers, then the server should send an error response with the +416 (not acceptable) status code, though the sending of an +unacceptable response is also allowed. A more elaborate example is - Accept: text/plain; q=0.5, text/html, - text/x-dvi; q=0.8; mxb=100000, text/x-c + Accept: text/plain: q=0.5, text/html, + text/x-dvi: q=0.8, text/x-c Verbally, this would be interpreted as "text/html and text/x-c are the preferred media types, but if they do not exist, then send the -text/x-dvi entity if it is less than 100000 bytes, otherwise send the -text/plain entity." +text/x-dvi entity, and if that does not exist, send the text/plain +entity." Media ranges can be overridden by more specific media ranges or specific media types. If more than one media range applies to a given type, the most specific reference has precedence. For example, - Accept: text/*, text/html, text/html;version=2.0, */* + Accept: text/*, text/html, text/html;level=1, */* have the following precedence: - 1) text/html;version=2.0 + 1) text/html;level=1 2) text/html 3) text/* 4) */* -The quality value associated with a given type is determined by -finding the media range with the highest precedence which matches that -type. For example, +The media type quality factor associated with a given type is +determined by finding the media range with the highest precedence +which matches that type. For example, - Accept: text/*;q=0.3, text/html;q=0.7, text/html;version=2.0, - */*;q=0.5 + Accept: text/*:q=0.3, text/html:q=0.7, text/html;level=1, + */*:q=0.5 would cause the following values to be associated: - text/html;version=2.0 = 1 + text/html;level=1 = 1 text/html = 0.7 text/plain = 0.3 image/jpeg = 0.5 text/html;level=3 = 0.7 -It must be emphasized that the Accept field should only be used when -it is necessary to restrict the response media types to a subset of -those possible or when the user has been permitted to specify -qualitative values for ranges of media types. If no quality factors -have been set by the user, and the context of the request is such that -the user agent is capable of saving the entity to a file if the -received media type is unknown, then the only appropriate value for -Accept is "*/*", or an empty value if the user desires reactive -negotiation. - Note: A user agent may be provided with a default set of quality values for certain media ranges. However, unless the user agent is a closed system which cannot interact with other rendering agents, this default set should be configurable by the user. 10.2 Accept-Charset The Accept-Charset request-header field can be used to indicate what character sets are acceptable for the response. This field allows clients capable of understanding more comprehensive or special-purpose character sets to signal that capability to a server which is capable -of representing documents in those character sets. The US-ASCII +of representing documents in those character sets. The ISO-8859-1 character set can be assumed to be acceptable to all user agents. - Accept-Charset = "Accept-Charset" ":" 1#charset + Accept-Charset = "Accept-Charset" ":" + 1#( charset [ ";" "q" "=" qvalue ] ) -Character set values are described in Section 3.4. An example is - - Accept-Charset: iso-8859-1, unicode-1-1 - -If no Accept-Charset field is given, the default is that any character -set is acceptable. If the Accept-Charset field is given and the -requested resource is not available in one of the listed character -sets, then the server should respond with the 406 (none acceptable) -status code. +Character set values are described in Section 3.4. Each charset may be +given an associated quality value which represents the user's +preference for that charset. The default value is q=1. An example is + + Accept-Charset: iso-8859-5, unicode-1-1;q=0.8 + +If no Accept-Charset header is present, the default is that any +character set is acceptable. If an Accept-Charset header is present, +and if the server cannot send a response which is acceptable according +to the Accept-Charset header, then the server should send an error +response with the 416 (not acceptable) status code, though the sending +of an unacceptable response is also allowed. 10.3 Accept-Encoding The Accept-Encoding request-header field is similar to Accept, but restricts the content-coding values (Section 3.5) which are acceptable in the response. Accept-Encoding = "Accept-Encoding" ":" #( content-coding ) An example of its use is Accept-Encoding: compress, gzip -If no Accept-Encoding field is present in a request, the server may +If no Accept-Encoding header is present in a request, the server may assume that the client will accept any content coding. If an -Accept-Encoding field is present, but contains an empty field value, -then the user agent is refusing to accept any content coding. +Accept-Encoding header is present, and if the server cannot send a +response which is acceptable according to the Accept-Encoding header, +then the server should send an error response with the 416 (not +acceptable) status code. 10.4 Accept-Language The Accept-Language request-header field is similar to Accept, but restricts the set of natural languages that are preferred as a response to the request. Accept-Language = "Accept-Language" ":" - 1#( language-tag [ ";" "q" "=" qvalue ] ) + 1#( language-range [ ";" "q" "=" qvalue ] ) + language-range = ( ( 1*8ALPHA *( "-" 1*8ALPHA ) ) + | "*" ) -The language-tag is described in Section 3.10. Each language may be -given an associated quality value which represents an estimate of the -user's comprehension of that language. The quality value defaults to -"q=1" (100% comprehension) for listed languages. This value may be -used in the server's content negotiation algorithm (Section 12). For -example, - - Accept-Language: da, en-gb;q=0.8, de;q=0.55 - -would mean: "I prefer Danish, but will accept British English (with -80% comprehension) or German (with a 55% comprehension)." +Each language-range may be given an associated quality value which +represents an estimate of the user's comprehension of the languages +specified by that range. The quality value defaults to "q=1" (100% +comprehension). For example, + Accept-Language: da, en-gb;q=0.8, en;q=0.7 +would mean: "I prefer Danish, but will accept British English (with +80% comprehension) and other types of English (with 70% +comprehension)." +A language-range matches a language-tag if it exactly equals the tag, +or if it exactly equals a prefix (a sub-sequence starting at the first +character) of the tag such that the first tag character following the +prefix is "-". The special range "*", if present in the +Accept-Language field, matches every tag not matched by any other +ranges present in the Accept-Language field. + + Note: This use of a prefix matching rule does not imply that + language tags are assigned to languages in such a way that it is + always true that if a user understands a language with a certain + tag, then this user will also understand all languages with tags + for which this tag is a prefix. The prefix rule simply allows the + use of prefix tags if this is the case. -If the server cannot fulfill the request with one or more of the -languages given, or if the languages only represent a subset of a -multi-linguistic Entity-Body, it is acceptable to serve the request in -an unspecified language. This is equivalent to assigning a quality -value of "q=0.001" to any unlisted language. - -If no Accept-Language header is present in the request, the server -should assume that all languages are equally acceptable. +The language quality factor assigned to a language-tag by the +Accept-Language field is the quality value of the longest +language-range in the field that matches the language-tag. If no +language-range in the field matches the tag, the language quality +factor assigned is 0. If no Accept-Language header is present in a +request, the server should assume that all languages are equally +acceptable. If an Accept-Language header is present, then all +languages which are assigned a quality factor greater than 0 are +acceptable. If the server cannot generate a response for an audience +capable of understanding at least one acceptable language, it can send +a response that uses one or more un-accepted languages. + +It may be contrary to be privacy expectations of the user to send an +Accept-Language header with the complete linguistic preferences of the +user in every request. For a discussion of this issue, see Section +14.7. + Note: As intelligibility is highly dependent on the individual user, it is recommended that client applications make the choice of linguistic preference available to the user. If the choice is not made available, then the Accept-Language header field must not be given in the request. +14 Security Considerations + +14.7 Privacy issues connected to Accept headers + +Accept request headers can reveal information about the user to all +servers which are accessed. The Accept-Language header in particular +can reveal information the user would consider to be of a private +nature, because the understanding of particular languages is often +strongly correlated to the membership of a particular ethnic +group. User agents which offer the option to configure the contents of +an Accept-Language header to be sent in every request are strongly +encouraged to let the configuration process include a message which +makes the user aware of the loss of privacy involved. + +An approach that limits the loss of privacy would be for a user agent +to omit the sending of Accept-Language headers by default, and to ask +the user whether it should start sending Accept-Language headers to a +server if it detects, by looking for any Vary or Alternates response +headers generated by the server, that such sending could improve the +quality of service. [end of text.]
Received on Saturday, 13 April 1996 11:28:39 UTC