save everything

Larry Masinter (masinter@parc.xerox.com)
Fri, 18 Aug 1995 12:09:51 PDT


To: uri@bunyip.com
Subject: save everything
From: Larry Masinter <masinter@parc.xerox.com>
Message-Id: <95Aug18.120953pdt.2763@golden.parc.xerox.com>
Date: Fri, 18 Aug 1995 12:09:51 PDT

> Larry - Do you mean that it should be the user, and not the author,
> who decides? Kind of the "don't let that go away, I'll need it" button?
> Looks like the only chance you have is to make a copy for
> yourself (or yourself and others). The author (or anybody else)
> may not be ready to pay for keeping the stuff on a server
> for eternity. Also, the author may not want the document
> to exist forever, or to be copied, and may get pretty angry
> at your intentions, probably with quite some justification.

Well, can authors really disable the 'save' button?

I've been wondering how much it costs to save something forever.  If
we're paying $0.30/megabyte now, and disk prices go down by 1/2 every
5 years, then we'll pay $0.15 cents/megabyte in 5 years, so the
present value of 'disk forever' is probably under $1/megabyte. Are
these numbers a serious underestimate? I've not accounted for system
administration costs, of course.

Perhaps we could gather some informal data: what's the average size of
the data behind the (still-valid) links in your '.netscape-history',
what's average * total-entries, and what time period of browsing does
it represent? Mine has under 10K entries for 60 days, and the average
size of data in our CERN proxy cache seems like it is under 2K.  That
would be $20 for a perpetual private copy of everything I've browsed
in the last 2 months.

If this cost could be amortorized over a workgroup, company, or the
world, we could probably reduce the cost of perpetual care of
everything that anyone has actually read to be quite affordable.  We
don't need a copy for everyone, we just need some way of making sure
there are enough replicas around that everything that we want to keep
is kept around.

As for whether the author or the reader pays, well, the costs here are
pretty low.

Of course, material gets updated; the 'saved' archival copy is for
those cases where the new/updated version can't be found or associated
with the original. From this point of view, an archival infrastructure
for the net is the complement of a permanent naming system; the names
account for how new data gets associated with old names, while the
archival system keeps around the old data for the old names.

My proxy cache keeps data stored by timestamp & URL, that is, that's
the 'name' of the material. URL + 'last modified'. This is a bit
cumbersome to use as a universal naming system, but it does meet most
of the requirements and the 'running code' criteria.