All of the WWW Available **Forever**

Edward Cherlin (
Mon, 19 May 1997 10:20:43 -0700

Message-Id: <v03007806afa63bb6549b@[]>
Date: Mon, 19 May 1997 10:20:43 -0700
From: Edward Cherlin <>
Subject: All of the WWW Available **Forever**

This suggests a new URL scheme: traditional URL plus date, directed to this
archive. Something similar for Usenet, also, directed to Deja News.

>Subject:  All of the WWW Available **Forever**
>From: ____Textpert Alert____ <>
>Mime-Version:  1.0
>Precedence: list
>Date:  Mon, 19 May 1997 14:41:14 +0200
>  True to my name handle, I'd like to alert y'all to the truly
>  Xanadudlian mission of the start-up Internet Archive and Alexa
>  companies, the former a non-profit effort to continuously
>      s t o r e  ALL OF (unrestricted-access) WWW pages FOREVER ;
>  the second a commercial outfit developing tools to browse and
>  reuse such cumulative/ multi-generation archive contents.
>  Acc. to their owner Brewster Kahle --formerly of the Thinking
>  Machines Corp., and a father of WAIS-- one of the target functions
>  of Alexa-derived software is to be a `"reliability service" that
>  will resurrect dead links.  Give the URL and an approximate date
>  to the Archive, and it will dig up the document.'.....  rings a
>  bell, doesn't it?
>  The Alexa archives are made of successive sweep-n-suck (BIIIG
>  sucks, too) sessions of the entire WWW dataspace resulting in
>  consecutive "frozen Webs" stored at one location -- currently
>  a warehouse in SF; ultimately in the digital storage facility of
>  the US National Archives in Washington, D.C.  Treating an entire
>  docuverse as a collection of "barts" (or "stamps", I keep mixing
>  them up) may sound like a bit of overkill, but whoever said that
>  the (yellow brick) road to Xanadu must be straight and narrow?
>Based on Paul Bissex' article at:
>>           [...] whereas keyword search engines [AltaVista etc]
>>           store an index to the Web, the Archive consists of a
>>           copy of the Web itself. Kahle estimates the current
>>           size of the Web at about two terabytes (that's two
>>           million megabytes). Having completed two full sweeps
>>           of the Web, the Archive now contains about four
>>           terabytes of data. A recent upgrade of the Archive's
>>           connection from two T1 lines to a full T3 brings
>>           a welcome 15-fold increase in bandwidth, meaning
>>           that future Web "snapshots" will be conducted much
>>           faster than the first two. With some researchers
>>           estimating the average life of a Web page at 75 days,
>>           speed matters.

Edward Cherlin       Help outlaw Spam     Everything should be made
Vice President      as simple as possible,
NewbieNet, Inc.  1000 members and counting      __but no simpler__.    17 May 97   Attributed to Albert Einstein