W3C home > Mailing lists > Public > www-talk@w3.org > March to April 2003

Re: comments? mirrors.txt

From: Justin Chapweske <justin@chapweske.com>
Date: Mon, 31 Mar 2003 12:52:05 -0600
Message-ID: <3E888E55.3010306@chapweske.com>
To: Andre John Mas <ajmas@newtradetech.com>
Cc: www-talk@w3.org

The only thing that I don't like about this is that normal HTTP 
mirroring is very insecure.

Our work with the "Content-Addressable Web" uses secure checksums and 
some HTTP extensions to provide an alternate way of solving the mirror 
problem.  Our paper on "HTTP Extensions for a Content-Addressable Web" 
can be found at 
(http://open-content.net/specs/draft-jchapweske-caw-03.html).

You may also be interested in its companion specification, "The Tree 
Hash EXchange format (THEX)" at 
(http://open-content.net/specs/draft-jchapweske-thex-02.html).

We also have a very basic XML-RPC protocol for lease-based mirror 
advertisement at (http://open-content.net/specs/).

Also, there is a functioning Content-Addressable Web header proxy that 
you can feel free to play around with.  Its currently used for the Open 
Content Network and can be used as follows:

bash$ HEAD 
http://gw1.open-content.net:8080/gateway/head?uri=http://etree01.archive.org/etree/moe1997-03-28dnk.shnf/moe1997-03-28d1/moe1997-03-28d1t06.shn

200 OK
Date: Sat, 18 Jan 2003 00:15:05 GMT
Accept-Ranges: bytes
Server: TornadoGateway/1.0 (http://onionnetworks.com/; i386; Linux)
Content-Length: 78691253
Content-MD5: 84lI1a9IFPJq7jb3YG3m9Q==
Content-Type: audio/shn
ETag: "3360009-4b0bbb5-3d8b447e"
Last-Modified: Fri, 20 Sep 2002 15:53:34 GMT
Client-Date: Mon, 31 Mar 2003 18:50:22 GMT
Client-Peer: 209.237.232.89:8080
X-Content-URN: urn:md5:6OEURVNPJAKPE2XOG33WA3PG6U
X-Content-URN: urn:sha1:VTHQINIP3JUPJIMMC5RLVZSEFKMQ5KLX
X-Content-URN: urn:tree:tiger:S6SMQPZXUD7G54ZPIJMXJPN7JAABQXM2ZCKIUEQ
X-First-Bytes: 
616a6b6702fbb17009f9255952a4d1a8dc48766a1157a0d5a8b66b6dd241108040201018040a0144d64020110d8c0a0104804420164b0dd2c3a08766a11ec0070000000b8000622efb1fb66659b36d85b45d6d77d3f081756652a563b41c94dbc24ce97b31fd3e4094415a862558d6756102e987170e9c591f2a5428dcc84bc43b21554c1444fe9a306fa2e9450125e78931c15f346cc6597762d6557c68623bc99254bdeaaf470888a9e104d631cca938cf0132314e7547b94069c86106060ea012e8d9c4c3211e99b4d3618070e33359a76670f85cc449e08468ec15ecf4e64e03d3dfb976c324444a9cf31ec599682060769e4e23bf9fce1ad3ffef94be4b
X-Observed-IP: 24.118.168.169
X-Thex-URI: 
http://gw1.open-content.net:8080/gateway/thex?uri=http://etree01.archive.org/etree/moe1997-03-28dnk.shnf/moe1997-03-28d1/moe1997-03-28d1t06.shn;S6SMQPZXUD7G54ZPIJMXJPN7JAABQXM2ZCKIUEQ



Andre John Mas wrote:
> 
> Hi,
> 
> Mirroring a web site or ftp site is a great way of reducing load
> and improving access times. The only thing though is that there is
> no method for telling a web browser to automatically go to a mirror.
> For this reason I have been thinking that a 'mirrors.txt' file might
> be of use at the root of a web site that is either the master or a
> mirror, in the same way that a robot.txt file is made available.
> 
> Follows is an example of what the contents of such a file would contain:
> 
> ----start of example
> #this is a comment
> 
> title:   Project Gutenberg
> description: Project Gutenberg is the Internet's oldest producer of FREE
>   electronic books (eBooks or eTexts).
> master:  http://gutenberg.net/
> search:  master
> 
> mirror.name: University of North Carolina - HTTP
> mirror.city: Chapel Hill
> mirror.state: North Carolina
> mirror.country: USA
> mirror.gridref:
> mirror.url: http://www.ibiblio.org/gutenberg/
> mirror.update.freq: daily
> mirror.comment: Main Project Gutenberg Collection Site
> 
> mirror.name: University of North Carolina - FTP
> mirror.city: Chapel Hill
> mirror.state: North Carolina
> mirror.country: USA
> mirror.gridref: 0/+1000,-1000
> mirror.url: ftp://ibiblio.org/pub/docs/books/gutenberg/
> mirror.update.freq: daily
> mirror.comment: Main Project Gutenberg FTP Site -- If it doesn't allow
>   access, please try the corresponding HTTP site above
> 
> ----end of example
> 
> Most of the fields should be self explaining, though for the less
> obvious:
>  - search: values would be mirror or master. This is important if
>    only the master offers a search facility
>  - mirror.gridref: the grid coordinates of the mirror. The slash
>    is there for a future use, such as defining planet ID as prefix.
>    The grid ref would always be the last child. I know this is
>    overkill, and probably no one will take this seriously, but I
>    would like to make this future proof, if there is no extra cost.
>  - mirror.update.freq: how oftern the mirror is updated (should this
>    be a numerical, textual value or both?)
> 
> Some sites mirror several others, so the site would probably need more
> than one mirror file. Two suggestions are to have the additional mirror
> files have a numeric suffix, e.g. mirrors.txt, mirrors2.txt, etc. or
> to have a mirrors.txt file that refers to the other mirror.txt files.
> 
> Also, search engines, such as Google, could make use of this information
> to tie together mirrors under one link, to make for smarter navigation.
> Something such as:
> 
>   PROJECT GUTENBERG -
>   Project Gutenberg is the Internet's oldest producer of FREE
>   electronic books (eBooks or eTexts).
>   gutenberg.org/ - 18k - Master - Closest Mirror - Other Mirrors
> 
> This is a first jab at something that could well be of use, so I would
> certainly appreciate your comments and whether this is something that
> could be added as a web standard?
> 
> regards
> 
> Andre
> 
> P.S. I am not associated with Project Gutenberg, I am just using it as
> a useful example of real site that could benefit from such a solution.
> 
> 


-- 
Justin Chapweske, Onion Networks
http://onionnetworks.com/
Received on Monday, 31 March 2003 13:56:40 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 27 October 2010 18:14:27 GMT