Some potentially interesting information about Expires:

Koen writes:
    On a related note, I recently discovered that the Netscape client
    cache, if configured to `verify document: every time', will indeed do
    a conditional GET for every new request on a resource that lacks an
    Expires header.  Eek.  I thought that `verify document' applied to
    conditional GETs on expired documents only, so I had enabled this
    option on my Netscape copy.
    
    I am a bit disturbed by Netscape having this cache configuration
    option at all.  If only 10% of Netscape users enable it, this will
    they will cause an enormous increase in the number of conditional GETs
    going over the net.

I think this (Netscape's "verify document always") feature may be
a symptom of the relatively poor coverage of "Expires:" in the
current Web.

Most of you known about http://altavista.digital.com, the Web
crawler and search engine developed by several of my colleagues
in Digital's research labs.  It turns out that the crawler logs
all of the response headers for all of the pages it has retrieved.
So I decided to survey those headers to see how Expires: is
currently used.  Actually, I looked at the results of a test
crawl that was done several months ago, not the one that was used
to populate the existing database.

For various reasons (such as the enormous amount of data involved,
other loads on the machine in question, and a power failure during
my log analysis), I was only able to analyze about 3 million headers.
And it's possible that these are not an accurate sample of the entire
crawlable Web, but I have no a priori reason to believe otherwise.
However, I suspect that some parts of the Web not accessible to
crawlers (for example, stock quote services) are more dynamic and
may make more use of Expires: headers.

Anyway, the results: of 3094665 responses that I analyzes,
7031 had Expires: headers.  That's about 0.23%.  Since the
logs are broken down into chunks of about 90K responses, I
was able to determine that in no group of 90K responses were
Expires: headers used more than about 0.35% of the time, or
less than about 0.13% of the time.  In other words, the fraction
seems relatively stable across large numbers of URLs.

I also looked at the individual Expires: values, and found
some interesting things.  First of all, servers are not consistent
about the date format they use.  I found:
	Mon Sep 18 19:11:16 1995
	Mon, 18 Sep 1995 00:31:15 GMT
	Mon, 18-Sep-95 04:22:18 GMT

I also found these values:
	0
	1 Jan 1970 00:00:00 UT
	now
	Mon, 01 Jan 1900 00:00:00 GMT
	Mon, 01-Jan-1990 00:00:00 GMT
which are different ways of encoding "already expired".

I found a few values far in the future:
	Fri, 31 Dec 1999 23:59:59 GMT
(someone still thinks the world will end before 1/1/2000).

This value looks a little dubious, both because the 1.1 draft
is quite specific about using GMT only, and because the asctime
date is not supposed to include a timezone anyway.
	Mon Sep 18 00:30:00 EDT 1995

Finally, I found these definitely bogus values:

	,     GMT
	, 16--95 16:13:58 GMT 16:08:57 GMT/3.0
	, 16--95 16:58:14 GMT 16:53:14 GMT
	180unday, 17-Sep-95 17:38:22 GMT
	Mon, 18 Sep 1995 0-18:08:00 GMT

So in summary I would say that it might well be sadly reasonable
to ignore "Expires:" today, since it's almost never used, and when
it is used, it is often clearly bogus.

-Jeff

Received on Friday, 29 December 1995 18:57:32 UTC