- From: Henrik Nordstrom <henrik@henriknordstrom.net>
- Date: Fri, 28 Aug 2009 17:27:45 +0200
- To: "Ford, Alan" <alan.ford@roke.co.uk>
- Cc: Robert Siemer <Robert.Siemer-http@backsla.sh>, Mark Nottingham <mnot@mnot.net>, ietf-http-wg@w3.org, Mark Handley <m.handley@cs.ucl.ac.uk>
fre 2009-08-28 klockan 12:38 +0100 skrev Ford, Alan: > Multiserver-Version (so that the server knows it's talking to a > multiserver-capable client and thus the ETag is defined this way). Not needed. It's sufficient the server announces the support. In fact strongly recommended it always announces it or you'll run into some hairy issues with caching.. > Which brings me onto another thing about Mirrors: header. One of our > longer-term goals with this would be to somehow provide wildcarded lists > of mirrors, so that a client could immediately run off and fetch bits of > a website from many mirrors, potentially speeding up loading time > considerably, and providing an alternative method of load balancing. That should imho be in a profile which you reference from a header, i.e. by using the Link header referring to a mirror profile. > However, I'm struggling to see a neat way of doing this reliably, since > we couldn't get checksums for every file on the first handshake (or if > all content was static we might be able to, but it's a big overhead). Right.. so the client need to pick one known server (perhaps "at random") as the master server for any given request, giving the needed object metadata, based on whatever prior knowledge it has about the mirror setup. > Does anybody have any ideas as to a neat way of doing this? Best I can > think of so far is some sort of version number/(pseudo)hash of the > entire directory structure! A such hash isn't useful unless you retrieve the complete structure, which most often is not what you want to do. Imho what you can provide in the mirror profile is just the URL patterns where content may be found. Hashes etc have to be resolved per object when fetched. Additionally the list of mirrors can be fairly large, making it unsuitable to be sent in HTTP headers. Consider for example a site with hundreds of mirrors which is not unrealistic (even the little Squid project have in the range of 70 registered and verified mirrors). So I would recommend the following slightly different approach to your problem. * Define a new Mirror profile object, similar to MetaLink but defining the mirror URL policy for groups of URLs on the server, without going into checksums etc (HTTP will give those). * Instance-Digest header returning the object checksum * HTTP addendum that servers participating in this mirror scheme should all share the same ETag policy, i.e. base it on the file contents and not server-unique filesystem metadata.. 1. First request for a mirrored URL. Plain GET requests, perhaps with a Range limit (not required). Client discovers the mirror profile link in the header, and maybe a MetaLink relation as well (the two happily coexists). From this response the client learns the following metadata about the requested object, in addition also starting to receive the object: * ETag * Instance-Digest * Mirror profile link. * Object size * Recovery profile link 2. If the object is large and gets delivered slower than expected then the client fetches the mirror profile, and then starts a number of parallel ranged downloads (one per selected mirror server other than the first) using If-Match conditions based on the ETag to quickly detect out-of-date mirrors. If no Range limit was given in the original request then work from the tail of the object (the first is still running and will eventually catch up), otherwise continue after the range requested in the first request. 2b. If a server rejects the If-Match condition then something is fishy. If the metadata came from the master server or the master server has already acknowledged the validity by accepting an If-Match condition then ignore those other servers rejecting If-Match. If the master server has not yet been queried then pick the master server as fallback for the first failed range. If the master server rejects the If-Match then restart the download from the beginning using the master server for the initial range. 3. If the first request was not Range limited then abort it by closing the connection when it catches up with the other parallel downloads of the same object. 3. On the next requested URL the mirror profile of the server is already known, and the client can pick the server that seems fastest for the initial request, where it will learn the required object-specific metadata (ETag, Size, Instance-Digest, Recovery profile link). 4. If the object checksum does not match the instance-digest then fetch the recovery profile link, where partial checksums etc can be found allowing detection of which server returned bad information. In this approach all servers providing the mirror service SHOULD use the same ETag and preferably also provide an Instance-Digest checksum. It's possible to specify this property of the available servers per server in the mirror profile however, and the modification for servers not sharing the same ETag is that If-Match won't be used for those servers. This slightly increases the risk of a failed transfer, requiring recovery after the download is supposed to be complete.. And at least one of the selected servers need to provide Instance-Digest to be able to detect corrupted transfers. I.e. it's in most cases sufficient that the master server provides mirror profile and instance-digest information, but operation will be more robust and efficient if the mirror servers do implement a common ETag and preferably Instance-Digest as well. In fact the emitted ETag may be implemented as the same as the instance digest for simplicity, but there is no need to specify how ETag generated, just that it needs to be shared among the mirror servers. Regards Henrik
Received on Friday, 28 August 2009 15:28:38 UTC