- From: Anthony Bryan <anthonybryan@gmail.com>
- Date: Sat, 19 Sep 2009 13:12:40 -0400
- To: Henrik Nordstrom <henrik@henriknordstrom.net>
- Cc: "Ford, Alan" <alan.ford@roke.co.uk>, Robert Siemer <Robert.Siemer-http@backsla.sh>, Mark Nottingham <mnot@mnot.net>, ietf-http-wg@w3.org, Mark Handley <m.handley@cs.ucl.ac.uk>
On Fri, Aug 28, 2009 at 11:27 AM, Henrik Nordstrom <henrik@henriknordstrom.net> wrote: > fre 2009-08-28 klockan 12:38 +0100 skrev Ford, Alan: > > So I would recommend the following slightly different approach to your > problem. > > * Define a new Mirror profile object, similar to MetaLink but defining > the mirror URL policy for groups of URLs on the server, without going > into checksums etc (HTTP will give those). > > * Instance-Digest header returning the object checksum > > * HTTP addendum that servers participating in this mirror scheme should > all share the same ETag policy, i.e. base it on the file contents and > not server-unique filesystem metadata.. Henrik, I have added your suggestions about ETags to my draft ( http://tools.ietf.org/html/draft-bryan-metalinkhttp ) almost verbatim. I didn't try to reword it, and if this is a problem, let me know. I am looking for interested collaborators and co-authors, and you've provided great insight. Would you like to join us? Here is the current description: Metalink servers are HTTP servers that MUST have lists of mirrors and use the Link header [draft-nottingham-http-link-header] to indicate them. They also MUST provide checksums of files via Instance Digests in HTTP [RFC3230]. Mirror and checksum information provided by the originating Metalink server MUST be considered authoritative. Metalink servers and their associated mirror servers SHOULD all share the same ETag policy, i.e. base it on the file contents (checksum) and not server-unique filesystem metadata. The emitted ETag may be implemented the same as the Instance Digest for simplicity. Mirror servers are typically FTP or HTTP servers that "mirror" another server. That is, they provide identical copies of (at least some) files that are also on the mirrored server. Mirror servers MAY be Metalink servers. Mirror servers MUST support serving partial content. Mirror servers SHOULD support Instance Digests in HTTP [RFC3230]. Metalink clients use the mirrors provided by a Metalink server with Link header [draft-nottingham-http-link-header]. Metalink clients MUST support HTTP and MAY support FTP, BitTorrent, or other download methods. Metalink clients MUST switch downloads from one mirror to another if the one mirror becomes unreachable. Metalink clients are RECOMMENDED to support multi-source, or parallel, downloads, where chunks of a file are downloaded from multiple mirrors simultaneously (and optionally, from Peer-to-Peer sources). Metalink clients MUST support Instance Digests in HTTP [RFC3230] by requesting and verifying checksums. Metalink clients MAY make use of digital signatures if they are offered. There is also some text about Content-MD5 for partial checksums. > 4. If the object checksum does not match the instance-digest then fetch > the recovery profile link, where partial checksums etc can be found > allowing detection of which server returned bad information. What do you suggest as a recovery profile link? A text file with partial checksums? Metalink XML could also be used, but do you think the XML dependency adds too much? As mentioned before, you can try out Metalink in HTTP headers w/ the software from here: http://metalinks.svn.sourceforge.net/viewvc/metalinks/checker/ http://metalinks.svn.sourceforge.net/viewvc/metalinks/webconvert/ (Python script to convert .metalink to Apache directives) I have read draft-ford-http-multi-server and my main comment is that the required coordination of all mirror servers may be difficult or impossible unless you are in control of all servers on the mirror network. I don't see this as possible in the open source mirror networks that I follow, but might be for commercial CDNs? In any case, this coordination is not required in my draft. Finally, here are some issues with my own draft: * Mirror negotiation. Only send a few mirrors, or only send them if Want-Digest is used? Some organizations have many mirrors. * Some publishers desire stronger hashes than MD5 and SHA-1. * Content-MD5 for chunk checksums could lead to many random size chunk checksum requests. Use consistent chunk sizes? * Do we want a way to show that whole directories are mirrored, instead of individual files? Comments/suggestions on these and other issues you may discover are welcome! -- (( Anthony Bryan ... Metalink [ http://www.metalinker.org ] )) Easier, More Reliable, Self Healing Downloads
Received on Saturday, 19 September 2009 17:13:21 UTC