Re: Changes to Content Negotiation, Entity Tags, and If-* from Roy T. Fielding on 1996-05-28 (ietf-http-wg@w3.org from April to June 1996)

From: Roy T. Fielding <fielding@avron.ICS.UCI.EDU>
Date: Tue, 28 May 1996 02:47:47 -0700
To: Koen Holtman <koen@win.tue.nl>
Cc: http-wg@cuckoo.hpl.hp.com
Message-Id: <9605280247.aa15902@paris.ics.uci.edu>
Koen writes:

> Perhaps paradoxically, though the diffs below simplify many aspects of
> the protocol, they also add language which _overspecifies_ the future
> transparent content negotiation mechanism, and which sometimes
> directly contradicts things the content negotiation subgroup has
> consensus on.

I checked my diffs, and then read the entire conneg mailing list
again article-by-article, and neither statement is true.  My diffs
have no negative impact on future content negotiation other than
requiring Vary to be sent even when Alternates is present (which is
the right thing to do anyway and only costs a few bytes).

> Technically, this overspecification is unnecessary.  Process-wise, this
> overspecification is *utterly unacceptable*: the wg has consensus to
> postpone the transparent content negotiation discussion until after 1.1
> is finished, and to put only minimal `hooks' for transparent content
> negotiation 1.1.  The overspecifications unnecessarily constrain the
> outcome of the future content negotiation discussion.  Below, I will
> indicate changed which remove all unacceptable overspecifications of the
> content negotiation mechanism.

"Hooks" meant a correct specification of existing practice, the Vary
header field, and what it means when a cache receives Vary.  Draft 03
incorrectly describes the consensus on Vary and contradicts all existing
implementations of negotiation (CERN, Apache, and Spyglass servers)
regarding the negotiation of error responses.

If this is truly what the conneg subgroup intended, then the conneg
subgroup failed.  However, having just read the mailing list archive
at <http://www.organic.com/public/conneg/mail/date.html>, I can tell
you quite frankly that your interpretation of conneg consensus on these
issues is bogus -- you just didn't make the right changes to draft 01.

The fact that it has taken so long for me to propose these changes is
because I have been extremely busy with ALL aspects of the protocol,
and this part wasn't my responsibility any more.  That does not mean
that is okay to go forward with a description that is clearly wrong.

>>Some major changes relate to fixing an incorrect description of
>>what can be done in terms of negotiation (and when negotiation applies),
> 
> Roy, I would like you to specify in which way the draft-03 description
> is incorrect.

See prior message.

>>a reworking of the whole notion of generic vs plain resources (none of
>>that was needed, and was wrong in any case because it ignored error
>>responses),  and reduction of duplicate descriptions of the entity
>>tag.
> 
>>  Some of the extensible aspects of the Vary header have been
>>removed because it was clearly impossible for any such extensions
>>to be deployed without breaking Vary (and thus they would be better
>>done via a separate header field anyway).
> 
> As far as I know, the extension stuff in the draft-03 Vary header
> section is *not* broken.  Please back up this `clearly impossible to
> deploy' claim.  I can live with the extension stuff going, but the only
> valid reason I see for removing it is the desire to reduce complexity.

See prior message.

>>The editorial group has
>>also decided that caches can be almost always as efficient, and much
>>simpler, without variant-ids.
> 
> Ugh.  _The members of the editorial group who met last week_ may have
> decided that, but I have not.  My opinion is that removal of
> variant-ids will make caches less efficient and more complex.  I could
> live with variant-ids being removed, but do not agree with your claim
> that this makes caches simpler.

The editorial group (those people doing the editing of the draft) agreed
to the change -- it was not a unanimous decision.  In actuality, the
agreement was that it should not have been added in the first place.

> [....]  
>>***************
>>*** 2731,2744 ****
>>  
>>  
>>  12.3.1.1 300 Multiple Choices
>>- This status code is reserved for future use by a planned content
>>- negotiation mechanism.  HTTP/1.1 user agents receiving a 300 response
>>- which includes a Location header field can treat this response as they
>>- would treat a 303 (See Other) response.  If no Location header field is
>>- included, the appropriate action is to display the entity enclosed in
>>- the response to the user.
>>  
>>  
>>  12.3.1.2 301 Moved Permanently
>>  The requested resource has been assigned a new permanent URI and any
>>  future references to this resource SHOULD be done using one of the
>>--- 2646,2671 ----
>>  
>>  
>>  12.3.1.1 300 Multiple Choices
> 
> The new version of `300 Multiple Choices' below is an unnecessary and
> unacceptable overspecification.  It should be replaced with the old
> version above.

No, as described in a prior message.  It should never have been castrated
in the first place, and there is no overspecification below.

>>+ The requested resource is available in one or more representations,
>>+ each with their own specific location, and agent-driven negotiation
>>+ information (section 15) is being provided so that the user (or user
>>+ agent) can select a preferred representation.
>>  
>>+ Unless it was a HEAD request, the response SHOULD include an entity
>>+ containing a list of resource characteristics and locations from which
>>+ the user or user agent can choose the one most appropriate. The entity
>>+ format is specified by the media type given in the Content-Type header
>>+ field. Depending upon the format and the capabilities of the user agent,
>>+ selection of the most appropriate choice may be performed automatically.
>>+ However, this specification does not define any standard format for
>>+ such automatic selection.
> 
> The paragraph above contradicts the consensus of the content negotiation
> subgroup.

No it does not -- I suggest you re-read the mail.

>>+ If the server has a preferred choice of representation, it SHOULD
>>+ include the specific URL for that representation in the Location field;
>>+ user agents MAY use the Location field value for automatic redirection.
>>
>>+ This response is cachable unless indicated otherwise.
>>+ 
>>  12.3.1.2 301 Moved Permanently
>>  The requested resource has been assigned a new permanent URI and any
>>  future references to this resource SHOULD be done using one of the
>>***************
>>*** 2924,2929 ****
>>--- 2851,2865 ----
> 
> [Note: this part of the diff concerns the 406 response:
> 
> 12.4.1.7 406 Not Acceptable
> The resource identified by the request is only capable of generating
> response entities which have content characteristics not acceptable
> according to the accept headers sent in the request.
> ]
> 
>>  response entities which have content characteristics not acceptable
>>  according to the accept headers sent in the request.
>>  
>>+ Unless it was a HEAD request, the response SHOULD include an entity
>>+ containing a list of available entity characteristics and locations
>>+ from which the user or user agent can choose the one most appropriate.
>>+ The entity format is specified by the media type given in the
>>+ Content-Type header field. Depending upon the format and the
>>+ capabilities of the user agent, selection of the most appropriate
>>+ choice may be performed automatically.  However, this specification
>>+ does not define any standard format for such automatic selection.
>>+ 
> 
> I closed the 406 response issue a long time ago.  The addition of the
> text above is _completely unacceptable_.  This text must be be removed,
> the draft-03 text is correct as it is.
> 
> The 406 response is meant to be used by resources which have only one
> representation available.  The 416 response, which was in draft-02 is
> *did* require such lists to be sent, and this response has already
> been removed in draft-03 as an overspecification of the future content
> negotiation mechanism.

Your interpretation of the 406 response is incorrect.  The description
above is applicable to HTTP current practice and necessary for future
practice, and has been in the protocol spec for over a year.  You did
not provide any justification for deleting it, nor was that justification
represented on the conneg mail list.

>>  HTTP/1.1 servers are allowed to return responses which are not
>>  acceptable according to the accept headers sent in the request. In some
>>  cases, this may even be preferable to sending a 406 response.  User
>>***************
> 
> [...]
> 
> The new version of section 15 below contains several
> overspecifications, some of which even directly contradict the
> consensus of the content negotiation subgroup.  I will include a
> rewritten section 15 after my comments.

I checked -- both claims are false.

>>  15 Content Negotiation
>>  
>>! Most HTTP responses include an entity which contains information
>>! for interpretation by a human user. Naturally, it is desirable to
>>! supply the user with the "best available" entity corresponding to the
>>! request. Unfortunately for servers and caches, not all users have the
>>! same preferences for what is "best", and not all user agents are
>>! equally capable of rendering all entity types. For that reason, HTTP
>>! supports various mechanisms for "content negotiation" -- the process
>>! of selecting the best representation of a resource for a given response
>>! when there are multiple representations available.
>>  
>>!    Note: This is not called "format negotiation" because the alternate
>>!    representations may be of the same media type, but use different
>>!    capabilities of that type, be in different languages, etc.
>>  
>>! Any response containing an entity-body MAY be subject to negotiation,
>>! including error responses.
>>  
>>! There are two kinds of content negotiation which are possible in HTTP:
>>! server-driven negotiation (also known as preemptive or opaque negotiation)
>>! and agent-driven negotiation (also known as reactive negotiation).
> 
> The `also known as' comments above should be deleted.  What you mean
> to say here is
> 
>  There are two kinds of content negotiation which are possible in
>  HTTP: server-driven negotiation (which is a generalized form of the
>  preemptive part of transparent negotiation and of opaque negotiation)
>  and agent-driven negotiation (which is a generalized form of the
>  reactive part of transparent negotiation).
> 
> and while these remarks are informative for most members of the
> http-wg _now_, they do not belong in the 1.1 spec.

No, that is not what I mean.  Transparent negotiation is an aberration --
a subset of server-driven negotiation for the sole purpose of it being
interpreted by intermediaries on behalf of the origin server.  Both
reactive and preemptive negotiation pre-date any notion of transparent
negotiation by over three years.  I added those remarks in order to
avoid confusing the hell out of people due to the name change, and
because draft 03 was completely wrong in its description of what forms
negotiation can take.

>>! These two kinds of negotiation are orthogonal and thus may be used
>>! separately or in combination without affecting the interpretation of
>>! a response.
> 
>>      One method of combination, referred to as transparent
>>! negotiation, occurs when a cache uses the agent-driven negotiation
>>! information provided by the origin server in order to provide
>>! server-driven negotiation for subsequent requests.
> 
> The above sentence is an overspecification which will needlessly
> constrain the result of future discussions about transparent
> negotiation.  It should be removed.  The overspecification also subtly
> mismatches the consensus of the content negotiation subgroup.

Nonsense.  First, it doesn't constrain anything. Second, it exactly
describes your own stated intentions for the Alternates header and
the purpose of transparent negotiation.  See
<http://www.organic.com/public/conneg/mail/0032.html> for background.

>>  
>>+ 15.1 Server-driven Negotiation
>>  
>>+ With server-driven negotiation, the selection of the best representation
>>+ for a response is done by an algorithm located at the origin server and
>>+ unknown to the client(s) receiving the response. 
> 
> This `and unknown to...' is no longer true if server-driven negotiation
> also encompasses preemptive responses.  The `and unknown to...'  part
> must thus be deleted.

Okay -- I agree that it isn't necessary.

>>   Selection is based on
>>+ the available representations of the response (the dimensions over which
>>+ it can vary)
> 
> `the available representations' are not the same as `the dimensions over
> which it can vary'.  The part between parentheses should be deleted.

The parenthetical comment is an elaboration of how the available
representations constrain how the response varies.  It isn't necessary.

>>     and either the contents of particular header fields in the
>>+ request message or on other information pertaining to the request
>>+ (such as the network address of the client).
> 
> The use of `Either .... or' is incorrect: both header fields and other
> information could be used at the same time.  See below for a rewrite.

Then it should be "or on any information pertaining".

>>+ Server-driven negotiation is advantageous when the algorithm for
>>+ selecting from among the available representations is difficult to
>>+ describe to the user agent, or when the server desires to send its
>>+ "best guess" to the client along with the first response (hoping to
>>+ avoid the round-trip delay of a subsequent request if the "best guess"
>>+ is good enough for the user).  In order to improve the server's guess,
>>+ the user agent MAY include request header fields (Accept, Accept-Language,
>>+ Accept-Encoding, etc.) which describe its preferences for such a
>>+ response.
>>  
>>! Server-driven negotiation has several disadvantages. First, it is
>>! impossible for the server to accurately determine what might be "best"
>>! for any given user, since that would require complete knowledge of
>>! both the capabilities of the user agent and the intended use for the
>>! response (e.g., does the user want to view it on screen or print it
>>! on paper?). Second, having the user describe its capabilities in every
>>! request is both horrendously inefficient (given that only a small
>>! percentage of responses have multiple representations) and a potential
>>! violation of the user's privacy. Third, it significantly complicates
>>! the implementation of an origin server and the algorithms for generating
>>! responses to a request. 
> 
>                             Finally, it could interfere with a public
>>! cache's ability to use the same response for multiple user's requests.
> 
> This `Finally...' part is not true, as far as I can see.

It "could" is completely accurate -- I don't expect all caches to impement
caching of negotiated responses, and therefore it could have an adverse
affect on caching.

> Also, two paragraphs above are much too long.  There is no need to
> describe advantages and disadvantages in such detail.  Such descriptions
> also overspecify tradeoffs with respect to preemptive negotiation we
> still need to discuss.  I will supply much shorter text below.

You are wrong -- John Klensin specifically asked us to defend, within
the specification, why more than one type of negotiation is allowed,
and suggested that such a description would be necessary to avoid later
push-back by the IESG.  I agreed.

>>...

>>! 15.2 Agent-driven Negotiation
>>  
>>! With agent-driven negotiation, selection of the best representation
>>! for a response is performed by the user agent after receiving an initial
>>! response from the origin server. Selection is based on a list of the
>>! available representations of the response included within the header
>>! fields (Alternates, appendix 23.2.5.1) or entity-body of the initial
>>! response, with each alternative identified by its own URI. Selection
>                        ^^^^^^^^^^^
>>! from among the alternatives may be performed automatically (if the
>                   ^^^^^^^^^^^^ 
> 
> These should be `representation', `representations'.  The term
> `Alternatives' is not defined.

It is defined by the English language -- that is sufficient.

>>! user agent is capable of doing so) or manually by the user selecting
>>! from a generated (possibly hypertext) menu.
>>  
>>+ Agent-driven negotiation is advantageous when the response would vary
>>+ over commonly-used dimensions (such as type, language, or encoding),
>         ^^^^^^^^^^^^^ 
> Commonly-used should be `common'.

Yes.

>>+ when the origin server is unable to determine a user agent's capabilities
>>+ from examining the request, and at any time when public caches are used
>>+ to distribute server load and reduce network usage.
> 
> `At any time' is too strong a claim to make before we have even
> defined a mechanism for agent-driven negotiation.  In most cases, *no*
> negotiation will be better for caches than agent-driven negotiation.

Okay, delete "at any time ".

>>! Agent-driven negotiation suffers from the disadvantage of needing a
>>! second request to obtain the best alternate representation.  This second
>>! request is only efficient when hierarchical caching is used.
>                                   ^^^^^^^^^^^^
> Huh?  _Hierarchical_ caching?  This claim is completely bogus.

Okay, delete that sentence.

>>  In addition,
>>! this specification does not define any mechanism for supporting
>>! automatic selection, though it also does not prevent any such mechanism
>>! from being developed as an extension and used within HTTP/1.1.
>>! 
>>! HTTP/1.1 defines the 300 (multiple choices) and 406 (not acceptable)
>>! status codes for enabling agent-driven negotiation when the server is
>>! unwilling or unable to provide a varying response using server-driven
>>! negotiation.
> 
> 406 has nothing to do with agent-driven negotiation, see above.

You are wrong -- "unwilling or unable to provide a varying response using
server-driven negotiation" does encompass 406, and the reason we invented
that status code was explicitly to support negotiation in the case where
no available representation matches the user's Accept requirements.
By including the available representations in the 406 response entity,
we enabled agent-driven negotiation when server-driven negotiation fails.

>>! 15.3 Transparent Negotiation
> 
> This section contains no requirements relevant to implementers of
> plain 1.1 implementations.  Also, it is an overspecification which
> will needlessly constrain the result of future discussions about
> transparent negotiation.  15.3 must be removed.

No, it expresses requirements on any HTTP/1.1 application.  It exists
to explicitly place those requirements on any future implementation
of transparent negotiation because those requirements are ALREADY
known to be true, whether you like them or not.  It is necessary to
understand that the Vary header field must be generated by the cache
if the cache is doing server-driven negotiation based on only
agent-driven negotiation information.

>>! Transparent negotiation is a combination of both server-driven and
>>! agent-driven negotiation.  When a cache is supplied with an automated
>>! form of the list of available representations of the response
>>! (as in agent-driven negotiation) and the dimensions of variance are
>>! completely understood by the cache, then the cache becomes capable of
>>! performing server-driven negotiation on behalf of the origin server
>>! for subsequent requests on that resource.
>>! 
>>! Transparent negotiation has the advantage of distributing the
>>! negotiation work that would otherwise be required of the origin server
>>! and also removing the second request delay of agent-driven negotiation
>>! when the cache is able to correctly guess the right response and
>>! already has that response cached.
>>! 
>>! A cache performing transparent negotiation MUST include the agent-driven
>>! negotiation information along with the response, and MUST add a Vary
>>! header field to the response (defining the dimensions of its variance)
>>! if a Vary field was not already assigned by the origin server.
>>! 
>>! These requirements apply to HTTP/1.1 applications even though this
>>! specification does not include a means for accomplishing transparent
>>! negotiation, since an understanding of these requirements is a
>>! necessary prerequisite for any future implementation of these features.
>>! 
>>! 
> [...]
> 
>>--- 4700,4736 ----
>>  missing), and must discard the other partial information.
>>  
>>  
>>! 16.5 Caching Negotiated Responses
>>  
>>! Use of server-driven content negotiation (section 15), as indicated by
>>! the presence of a Vary header field in a response, alters the conditions
>>! and procedure by which a cache can use the response for subsequent requests.
>>  
>>! A server MUST use the Vary header field (section 18.46) to inform a cache
>>! of what header field dimensions are used to select among multiple
>>! representations of a response. A cache can use the selected
>>! representation (the entity included with that particular response) 
> 
> Add here:
> 
>    , if it is still fresh,

No, that would not be accurate without elaborating all of the extraneous
conditions, and is covered by another section anyway.

>>   for
>>! replying to subsequent requests on that resource only when the subsequent
>>! requests have the same or equivalent value for all header fields
>>! specified in the Vary response-header.  Requests with a different value
>>! for one or more of those header fields SHOULD be forwarded toward the
> 
>>! origin server; if an entity tag was assigned to the representation,
>>! the forwarded request SHOULD be conditional and include the entity tag
>>! in an If-None-Match header field.
> 
> This is confusing.  A rewrite:
> 
>    origin server.  If the cache has accumulated one or more
>    representations belonging the the resource, it SHOULD add an
>    If-None-Match header listing the entity tags of these representations
>    to the forwarded request.

Okay.

>>! The Vary header field may also inform the cache that the representation
>>! was selected using criteria not limited to the request headers; in this
>>! case, a cache SHOULD NOT use the response in a reply to a subsequent
>>! request unless the cache relays the new request to the origin server in
>>! a conditional request and the server responds with 304 (Not Modified),
>>! including an entity tag or Content-Location that indicates which entity
>                            ^^^^^^^^^^^^^^^^^^^
>>! should be used.
> 
> This `or Content-Location' must be removed.  Draft-03 did not allow
> Content-Location to be used this way.

And Draft 04 will.  You went too far in disallowing uses of Content-Location
in areas not having to do with spoofing.  I corrected that.

> [...]
> 
>>*** 4993,5061 ****
>>  updates and the problems arising from server, cache, or network failure
>>  prior to write-back.
>>  
>>- 
>>- 16.12  Generic Resources and HTTP/1.0 Proxy Caches
>>- If the correct handling of responses from a generic resource (Section
>>- 15) by HTTP/1.0 proxy caches in the response chain is important,
>>- HTTP/1.1 origin servers can include the following Expires (Section
>>- 18.22) response header in all responses from the generic resource:
>>- 
>>-      Expires: Thu, 01 Jan 1980 00:00:00 GMT
>>- 
>>- If this Expires header is included, the server should usually also
>>- include a Cache-Control header for the benefit of HTTP/1.1 caches, for example
>>- 
>>-      Cache-Control: max-age=604800
>>- 
>>- which overrides the freshness lifetime of zero seconds specified by the
>>- included Expires header.
>>- 
> 
> I see that the diff removes this section.  I consider this to be a bad
> idea, and would at least like to see an explanation of why it was
> removed.

As explained in a prior message, HTTP's caching features are not version
dependent.  This section is therefore rubbish.

> [...]
> 
>>--- 7082,7130 ----
>>  
>>  
>>  18.46 Vary
>>  
>>! The Vary response-header field is used by a server to signal that the
>>! response entity was selected from the available representations of the
>>! response using server-driven negotiation (section 15).  The Vary field
>>! value indicates either that the given set of header fields encompass
>>! the dimensions over which the representation might vary, or that the
>>! dimensions of variance are unspecified ("*") and thus may vary over
>>! any aspect of future requests.
>>  
>>!        Vary  = "Vary" ":" ( "*" | 1#field-name )
>>  
>>! An HTTP/1.1 server MUST include an appropriate Vary header field with
>>! any response that is subject to server-driven negotiation.  Doing so
>>! allows a cache to properly interpret future requests on that resource
>>! and informs the user agent about the presence of negotiation on that
>>! resource.
>>  
> 
>>! A Vary field value of "*" signals that parameters other than the
>>! contents of request-header fields (e.g., the network address of the
>>! client) play a role in the selection of the response representation.
> 
> This contradicts the end of the first paragraph.  A rewrite:
> 
>    A Vary field value of "*" signals that unspecified parameters,
>    possibly other than the contents of request-header fields (e.g., the
>    network address of the client), play a role in the selection of the
>    response representation.

Okay.

>>! Subsequent requests on that resource can only be properly interpreted
>>! by the origin server, and thus a cache SHOULD forward a (possibly
>>! conditional) request even when it has a fresh response cached from
>>! a prior request on the resource.
>>  
>>! A Vary field value consisting of a list of field-names signals that
>>! the representation selected for the response is based on a selection
>>! algorithm which considers ONLY the listed request-header field values in
>>! selecting the most appropriate representation.  This selection algorithm
>>! MAY be assumed to remain unchanged (and thus apply to future requests)
>>! for the duration of time in which the response is fresh.
> 
> This last sentence is not an entirely correct summary of the more
> explicit rules in draft-03.  A rewrite:
> 
>    A cache MAY assume that the same selection will be made for future
>    requests with the same values for the listed field names, for the
>    duration of time in which the response is fresh.  

Ummm, okay -- mine actually covers more ground than that, but it isn't
necessary.

>>+ The field-names given are not limited to the set of standard
>>+ request-header fields defined by this specification. Field names are
>>+ case-insensitive.
> 
> Some important details from the old draft-03 vary section are missing
> here.  They must be added back.  The specific text is:
> 
>  The field name "Host" MUST never be included in a Vary header; clients
>  MUST ignore it if it is present.  The names of fields which change the
>  semantics of a GET request, like "Range" and "If-Match" MUST also never
>  be included, and MUST be ignored when present.

Why?  Vary is controlled by the server, so why is this NECESSARY?

>  Servers which use access authentication are not obliged to send "Vary:
>  Authorization" headers in responses.  It MUST be assumed that requests
>  on authenticated resources can always produce different responses for
>  different users.  Note that servers can signal the absence of
>  authentication by including "Cache-Control: public" header in the
>  response.

This is just wrong, as I described earlier.

>>+ The value of the Vary field itself MUST NOT vary on any aspect of the
>>+ request other than the requested resource (Request-URI and Host).
>>+ In other words, two simultaneous requests on the same resource must 
>>+ result in identical Vary fields (if any), regardless of the header
>>+ fields in those requests, if the response status codes are identical.
> 
> The above paragraph specifies a new requirement that contradicts a part
> of draft-03 I detected rough consensus on.  This paragraph must be
> removed.

Please fix your detector -- Vary cannot be implemented without it,
and this was the clear and overwhelming consensus on conneg, and it
is how Vary was defined by David Robinson's draft, and I have mentioned
it to you at least three times (twice by phone, once in person).


 ...Roy T. Fielding
    Department of Information & Computer Science    (fielding@ics.uci.edu)
    University of California, Irvine, CA 92717-3425    fax:+1(714)824-4056
    http://www.ics.uci.edu/~fielding/
Received on Tuesday, 28 May 1996 02:47:47 UTC