Re: HTTP Extensions for Simultaneous Download from Multiple Mirrors

Most of this is from a reply to the other thread...

On Mon, Jul 27, 2009 at 11:03 AM, Henrik Nordstrom
<henrik@henriknordstrom.net> wrote:
> This draft made a bit of surprise appearance in the transport area
> meeting today:
>
> http://tools.ietf.org/html/draft-ford-http-multi-server
>
> My initial reaction is lots of obvious overlap with other work and
> misunderstandings of basic HTTP functions like ETag.
>
> Basic motivation behind the work may be reasonable however.
>
> I will try to catch the author for a more in-depth discussion shortly.
>
> Other opinions?

I have read draft-ford-http-multi-server and my main comment is that
the required coordination of all mirror servers may be difficult or
impossible unless you are in control of all servers on the mirror
network.

I don't see this as possible in the open source mirror networks that I
follow, but might be for commercial CDNs?

Mirrors frequently mirror multiple projects. Comments from open source
mirror networks indicate just requiring ETag synchronization among
mirrors would likely be too much. In any case, this coordination is
not required in my draft (
http://tools.ietf.org/html/draft-bryan-metalinkhttp ) which attempts
to provide similar solutions using existing standards.

Here is the current description:

   Metalink servers are HTTP servers that MUST have lists of mirrors and
   use the Link header [draft-nottingham-http-link-header] to indicate
   them.  They also MUST provide checksums of files via Instance Digests
   in HTTP [RFC3230].  Mirror and checksum information provided by the
   originating Metalink server MUST be considered authoritative.
   Metalink servers and their associated mirror servers SHOULD all share
   the same ETag policy, i.e. base it on the file contents (checksum)
   and not server-unique filesystem metadata.  The emitted ETag may be
   implemented the same as the Instance Digest for simplicity.

   Mirror servers are typically FTP or HTTP servers that "mirror"
   another server.  That is, they provide identical copies of (at least
   some) files that are also on the mirrored server.  Mirror servers MAY
   be Metalink servers.  Mirror servers MUST support serving partial
   content.  Mirror servers SHOULD support Instance Digests in HTTP
   [RFC3230].

   Metalink clients use the mirrors provided by a Metalink server with
   Link header [draft-nottingham-http-link-header].  Metalink clients
   MUST support HTTP and MAY support FTP, BitTorrent, or other download
   methods.  Metalink clients MUST switch downloads from one mirror to
   another if the one mirror becomes unreachable.  Metalink clients are
   RECOMMENDED to support multi-source, or parallel, downloads, where
   chunks of a file are downloaded from multiple mirrors simultaneously
   (and optionally, from Peer-to-Peer sources).  Metalink clients MUST
   support Instance Digests in HTTP [RFC3230] by requesting and
   verifying checksums.  Metalink clients MAY make use of digital
   signatures if they are offered.

There is also some text about Content-MD5 for partial checksums.

Here are some issues with my own draft:

    * Mirror negotiation. Only send a few mirrors, or only send them
if Want-Digest is used? Some organizations have many mirrors.
    * Some publishers desire stronger hashes than MD5 and SHA-1.
    * Content-MD5 for chunk checksums could lead to many random size
chunk checksum requests. Use consistent chunk sizes?
    * Do we want a way to show that whole directories are mirrored,
instead of individual files?

As mentioned before, you can try out Metalink in HTTP headers w/ the
software from here:

http://metalinks.svn.sourceforge.net/viewvc/metalinks/checker/
http://metalinks.svn.sourceforge.net/viewvc/metalinks/webconvert/
(Python script to convert .metalink to Apache directives)

Comments/suggestions on these and other issues you may discover are welcome!

-- 
(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
  )) Easier, More Reliable, Self Healing Downloads

Received on Monday, 21 September 2009 05:36:09 UTC