Caching dynamically generated documents

Maybe this does not belong exactly here, but has some relations with
the problem of caching. I have (partly) followed the discussion on
caching results of GET/POST, and have some opinions on the subject.

To start with an example, I would like to point out a typical usage of
/cgi-bin at our site, which I believe is not uncommon.
Many many times, we use /cgi-bin scripts to do the following:

1) browse through a (possibly small) database, and return the results
   to the user. Examples include returning the schedule of classes
   or exams in our faculty: the database is of the order of 500 records,
   which can be highly compressed. The browsing code is just a few
   lines written in awk or perl. The typical request returns from one
   to a few pages of data which, when nicely formatted with tables and
   anchors, has a size comparable to the database.

2) present a nicely-formatted version of a file. Examples: the
   occupation status of terminals or classrooms. In these cases, the
   databases are a few hundred bytes. Again, formatting is done in
   awk or perl, and the code is very compact. The formatted output,
   though, produces an anchor and a table entry for every item (which
   is encoded on one bit) of the original database. As a result, the
   output may easily become from 100 to 1000 times bigger than the
   original.

Obviously the problem is that the code is run on the server, instead
of as close as possible to the client. There are several drawbacks
in this approach:

* many more bytes than necessary are transferred;
* the server is unnecessarily kept busy generating and transferring
  all the above traffic;
* the data is essentially uncacheable because of the variety of
  possible requests.

This is certainly a problem for servers which supply services of the
kind I mentioned. Things are going to get worse and worse as WWW services
become more widespread and possibly used in wireless, low bandwidth
environments [example: queries for all the trains or planes to a
given destination from a station or airport].

It is also a problem for caches, which must either give up or
develop complex and memory consuming techniques essentially to
try to reconstruct the behaviour of the server from its responses.
This is both for GET (where no side effects can be assumed, but
requests with different parameters possibly yield different results),
and POST methods.

And there is also a terrible [:)] thing: your cache statistics' are
negatively affected by these large, uncacheable items.  More
seriously, these "uncachable" items might, in many cases, become
easily cacheable.

I believe this problem should be dealt with *before* trying to develop
solutions to cache POST or other dynamically generated data.
Maybe somebody is aware of ongoing work on this subject, I would like
to know more on it. 

A (possibly not too hard) way to approach the problem without
requiring an upgrade of all clients in the worlds could be the use
of an embedded language (JavaScript, awk, perl, tcl, whatever) at
least on the caches. The server could then feed the cache with the
raw data and the code, for local processing.

After this is done, non compliant services would slowly die because
intrinsecally less efficient than others: caches could even refuse to
cache them!

The main question is, how long would it take to reach agreement on a
language ?

	Luigi
====================================================================
Luigi Rizzo                     Dip. di Ingegneria dell'Informazione
email: luigi@iet.unipi.it       Universita' di Pisa
tel: +39-50-568533              via Diotisalvi 2, 56126 PISA (Italy)
fax: +39-50-568522              http://www.iet.unipi.it/~luigi/
====================================================================

Received on Friday, 5 January 1996 15:38:03 UTC