Re: Multi-server HTTP

On Fri, Aug 28, 2009 at 11:27 AM, Henrik Nordstrom
<henrik@henriknordstrom.net> wrote:
> fre 2009-08-28 klockan 12:38 +0100 skrev Ford, Alan:
>
> So I would recommend the following slightly different approach to your
> problem.
>
> * Define a new Mirror profile object, similar to MetaLink but defining
> the mirror URL policy for groups of URLs on the server, without going
> into checksums etc (HTTP will give those).
>
> * Instance-Digest header returning the object checksum
>
> * HTTP addendum that servers participating in this mirror scheme should
> all share the same ETag policy, i.e. base it on the file contents and
> not server-unique filesystem metadata..

Henrik, I have added your suggestions about ETags to my draft (
http://tools.ietf.org/html/draft-bryan-metalinkhttp ) almost verbatim.
I didn't try to reword it, and if this is a problem, let me know.
I am looking for interested collaborators and co-authors, and you've
provided great insight. Would you like to join us?

Here is the current description:

   Metalink servers are HTTP servers that MUST have lists of mirrors and
   use the Link header [draft-nottingham-http-link-header] to indicate
   them.  They also MUST provide checksums of files via Instance Digests
   in HTTP [RFC3230].  Mirror and checksum information provided by the
   originating Metalink server MUST be considered authoritative.
   Metalink servers and their associated mirror servers SHOULD all share
   the same ETag policy, i.e. base it on the file contents (checksum)
   and not server-unique filesystem metadata.  The emitted ETag may be
   implemented the same as the Instance Digest for simplicity.

   Mirror servers are typically FTP or HTTP servers that "mirror"
   another server.  That is, they provide identical copies of (at least
   some) files that are also on the mirrored server.  Mirror servers MAY
   be Metalink servers.  Mirror servers MUST support serving partial
   content.  Mirror servers SHOULD support Instance Digests in HTTP
   [RFC3230].

   Metalink clients use the mirrors provided by a Metalink server with
   Link header [draft-nottingham-http-link-header].  Metalink clients
   MUST support HTTP and MAY support FTP, BitTorrent, or other download
   methods.  Metalink clients MUST switch downloads from one mirror to
   another if the one mirror becomes unreachable.  Metalink clients are
   RECOMMENDED to support multi-source, or parallel, downloads, where
   chunks of a file are downloaded from multiple mirrors simultaneously
   (and optionally, from Peer-to-Peer sources).  Metalink clients MUST
   support Instance Digests in HTTP [RFC3230] by requesting and
   verifying checksums.  Metalink clients MAY make use of digital
   signatures if they are offered.

There is also some text about Content-MD5 for partial checksums.

> 4. If the object checksum does not match the instance-digest then fetch
> the recovery profile link, where partial checksums etc can be found
> allowing detection of which server returned bad information.

What do you suggest as a recovery profile link? A text file with
partial checksums? Metalink XML could also be used, but do you think
the XML dependency adds too much?


As mentioned before, you can try out Metalink in HTTP headers w/ the
software from here:

http://metalinks.svn.sourceforge.net/viewvc/metalinks/checker/
http://metalinks.svn.sourceforge.net/viewvc/metalinks/webconvert/
(Python script to convert .metalink to Apache directives)

I have read draft-ford-http-multi-server and my main comment is that
the required coordination of all mirror servers may be difficult or
impossible unless you are in control of all servers on the mirror
network.
I don't see this as possible in the open source mirror networks that I
follow, but might be for commercial CDNs? In any case, this
coordination is not required in my draft.

Finally, here are some issues with my own draft:

    * Mirror negotiation. Only send a few mirrors, or only send them
if Want-Digest is used? Some organizations have many mirrors.
    * Some publishers desire stronger hashes than MD5 and SHA-1.
    * Content-MD5 for chunk checksums could lead to many random size
chunk checksum requests. Use consistent chunk sizes?
    * Do we want a way to show that whole directories are mirrored,
instead of individual files?

Comments/suggestions on these and other issues you may discover are welcome!
-- 
(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
  )) Easier, More Reliable, Self Healing Downloads

Received on Saturday, 19 September 2009 17:13:21 UTC