Re: Cache validators

Roy T. Fielding:
>Jeff writes:
>> We really have two different kinds of conditionals to consider:
>> 
>>       (1) cache-validation conditions: "does what I currently
>>       have in my cache match what you would currently send
>>       me in response to this request sufficiently well to
>>       maintain sematic transparency from your point of view?"
>> 
>>       (2) utility conditions: "will what you send me be useful
>>       to me according to some criteria (because if it is not,
>>       I don't want you to send it)?"
>> 
>> An example of (2) might be "is what I have asked for more than
>> 1000000 bytes, because if it is, I don't want to wait for it,
>> so you shouldn't send it".  I think this is an orthogonal concept
>> to cache validation, and we ought to be thinking about it separately.
>
>I don't -- coming up with separate solutions for the same problem is
>less efficient than a single solution for all such problems.

Roy, I agree with you that any solution to (2) also solves (1).

The question is: would solving (2), thereby automatically solving (1),
be too expensive?  I think it is, and I will explain why I think so
below.

[...]
>It is my understanding that many people have complained that the
>current Unless syntax and semantics will be too much of a burden
>to implement.

I am among these people, see below.

The biggest problem I have with If and logic bags is that you want to
_require_ servers to implement them.  The implementation of all
proposed conditional modifiers (If-modified-since, If-validator-valid,
Variant-Set, Alt-Header) is _optional_, because they are only there as
optimization mechanisms.  Server authors could implement none of them
and still comply to HTTP/1.1.

I have yet to see how we can require logic bag implementations in all
servers and still claim that 'it should be possible to implement a
minimal server in a single day's work'.

>  I would have a lot more faith in their opinion if they
>were to actually attempt to implement it FIRST, but I can't force
>people to do that.

I guess this is a good time for me to restate some comments on the
implementation of Unless that I sent to this list shortly before the
Dallas IETF, and to which I have had no reply since then.  I would
have a lot more faith in your opinion if you were to actually attempt
to address these comments, but I can't force you to do that.

1) The relational operators like 'gt' in logic bags have big problems
when used with undefined headers.  Your default does not make these
problems go away.  There are three important ways of comparing:

  numeric  (2 < 10)
  lexical  (ba < c)
  date     (Mon, 04 Dec 1995 01:23:45 GMT < Mon, 04 Dec 1995 01:23:55 GMT)

and the defined default, 'numeric comparison (for values consisting of
1*DIGIT) and lexical comparison (for all others)', would break on
undefined headers that carry a date.  I therefore propose to define
three `greater than` operators:

  gtn   greater than, numeric
  gtl   greater than, lexical
  gtd   greater than, date.

and maybe, but not required:

  gt   greater than, with field-name dependent function (use only with
       field-names defined in HTTP/1.1).

(of course, the same for ge, lt, le).  Another question: what is the
`appropriate comparison function' server authors are supposed to use
for `Content-Version'?  Do we really want to open this can of worms?


2) The biggest problem with Unless [If] is not the parsing of the
logic bag, but causality.

Quoting from the Unless section in the draft 1.1 spec:

|   When a request containing an Unless header field is received, the 
|   server must evaluate the expression defined by the listed 
|   logic-bags (Section 3.11). If the expression evaluates to false, 
|   then no change is made to the semantics of the request. If it 
|   evaluates true and the request is not a conditional GET 
|   (If-Modified-Since, Section 10.23) or a partial GET (Range, 
|   Section 10.33), then the server must abort the request and respond
                                    ^^^^^^^^^^^^^^^^^^^^^^
|   with the 412 (unless true) status code.

There is a big problem here. Consider the following request:

  POST /bin/send_mail HTTP/1.1
  Unless: {gt {Content-length 123}}
  Accept: */*
  <more request headers and the encoding of a form containing a mail
  message follow>

Now, the draft says that `the server must abort the request' if the
Content-length of the response would be bigger than 123.  The problem
is: abort when?  Before or after the send_mail CGI script sends the
mail?  As your use of examples in which Unless prevents operations
that cost money shows, you seem to require `before'.

Implementation and portability considerations dictate that the abort
can only happen _after_ the send_mail script has sent the mail.  Thus,
I propose that the text

  `the server must abort the request and respond
   with the 412 (unless true) status code.'

in Section 10.40 be replaced with 

   `the server must not send the response resulting from the normal
    semantics of the request, but respond with a 412 (unless true)
    status code.'

and that the following text is added at the end of the Section:

    `If a request has the significance of taking an action other than
    retrieval, the truth of the expression in the Unless field may,
    but is not required to, prevent the action from being taken.'

Without this change, we must basically put a logic bag parser and,
worse, speculative execution or `undo' code in each existing and
future CGI script that takes an action other than retrieval.  This is
not an option, especially not for CGI scripts with functionality like
`delete some files and report the cpu time used'.

My conclusions about implementability: having Unless will only be
viable if Unless processing can be done transparently to CGI scripts.
What you can implement under my proposed Unless semantics is the
following sequence of server actions:

  1. Decode request
  2. Run CGI script
  3. Prepare response headers using CGI script output
  4. Calculate value of logic bag in `unless' header
  5a. If true, send a 412 (unless true) response
  5b. If false, send a 200 (or whatever the CGI script produced), 
     response headers and entity body.

3) I observe that the above changes to make Unless implementable
basically make it a bandwidth-saving device, not a device for
preventing actions like sending mail from happening.

Proposed use of Unless as saying "don't do this if it will cost me
more than $.50' is not possible in the general case, but only for GET
and HEAD requests.

We may want to define another logic-bag based request header (for
example `No-action-if') for POST requests.  The difference with Unless
would be that a `No-action-if' header used on a standard CGI script
would cause the server to immediately send a `not implemented'
response.  Only special CGI scripts registered as capable of
`No-action-if' processing would be run by the server.


>  However, I won't allow six additional precondition
>syntaxes to be added to the protocol willy-nilly -- people will have to
>prove that they are more efficient in total than a single extensible
>syntax.

It would require at least star trek physics to implement the general
Unless in the 1.1 draft for the operation `delete some files and
report the cpu time used'.

>  My knowledge of HTTP applications allows me enough foresight
>to know that a single syntax is more efficient than any two additional
>preconditions, so proving it against six will be trivial.

While we are counting...  Your `single syntax' of logic bags says:

|   Server implementors must use an appropriate 
|   comparison function for each type of field-value given in this 
|   specification.

so you are requiring them to think up and implement `appropriate
comparison functions' (including `equals' and `less-than') for 21
different headers, if I count correctly.  Exactly how is this supposed
to be provably less work than optionally implementing 6 specialized
precondition headers?  Not to mention that the number won't be 6, but
less.

>On the other hand, I would also like to see HTTP/1.1 complete sometime
>this century.

:-)

>So, if people would like a simple precondition syntax that is useful
>for all of the currently identified protocol needs, including cache
>validation, byte ranges, and content negotiation, then I have the following
>suggestion:
>
>   1) Require Content-ID in HTTP/1.1 responses
>
>     Content-ID  =  "Content-ID" ":" cid
>            cid  =  <a content-id as defined in RFC 1521>

Why require Content-ID to be present in all responses?

>   2) Implement the following precondition syntax:
>
>     If-ID  =  "If-ID" ":" 1#cid
>
>      wherein the condition evaluates to true if the response to the
>      request would have had a Content-ID equal to one of the ones
>      given in the If-ID header field value.  Like the current definition
>      of Unless in draft 01, the response to a "false" evaluation
>      depends on whether or not Range or IMS is also present.

I think this should be "Unless-ID" (you do a normal GET _unless_ the
response will have one of these Content-IDs).

These two comments aside, I think this is a good proposal. By
requiring that Content-ID's of different resources are always
different, you eliminate a lot of the complexity that is in the URI
and Vary negotiation preconditions proposed now.

>That should make a sufficient number of people happy to make the
>overhead of doing it worthwhile.  If not, then the only reasonable
>solution is to use an IF header field with a generic syntax.

Requiring If and logic bags is not a reasonable solution, they have
far to many problems that need to be addressed first.  I'd rather have
6 specialized headers, though it seems that having 2 (Unless-ID and
If-Modified-Since for compatibility) is also sufficient.

> ...Roy T. Fielding

Koen.

Received on Saturday, 24 February 1996 07:57:03 UTC