Re: Multi-server HTTP from Anthony Bryan on 2009-08-26 (ietf-http-wg@w3.org from July to September 2009)

From: Anthony Bryan <anthonybryan@gmail.com>
Date: Tue, 25 Aug 2009 21:27:33 -0400
To: "Ford, Alan" <alan.ford@roke.co.uk>
Cc: Mark Nottingham <mnot@mnot.net>, ietf-http-wg@w3.org, Mark Handley <m.handley@cs.ucl.ac.uk>
Message-ID: <bb9e09ee0908251827p4ee3ddei42f973336376724a@mail.gmail.com>
On Tue, Aug 25, 2009 at 5:24 AM, Ford, Alan<alan.ford@roke.co.uk> wrote:
> Hi Mark, all,
>
> I have (admittedly only briefly) looked at metalink. It seems to cover
> some of what we need (list of mirrors, pieces, checksumming) but seems
> mostly to be concerned with finding a single appropriate source rather
> than downloading from multiple HTTP servers. This seems to mostly be a
> client rather than a spec choice, however. Nevertheless, one of the

This wasn't really a spec choice, more inadequacy of explaining of
what metalink offers in the abstract and introduction of our ID. :)
Looking at our ID, it doesn't really spell out what we've solved in
the past 4 years to those unfamiliar with metalink. Our ID is focused
more on the format, not on what the client does with it.

All but a few of the 30 some metalink clients support downloading from
multiple HTTP servers. That is, clients aren't required to support
multi-source downloads.
But, most metalink clients are download managers / accelerators. I
think using mirrors for fallback / failover is just as important
though.
See http://en.wikipedia.org/wiki/Metalink or (our embarrassing)
http://www.metalinker.org/implementation.html

Your excellent introduction put ours to shame, so I've tried to update ours:

All the information about a download, including mirrors, checksums,
digital signatures, and more can be stored in a machine-readable
Metalink file. This Metalink file transfers the knowledge of the
download server (and mirror database) to the client. Clients can
fallback to alternate mirrors if the current one has an issue. With
this knowledge, the client is enabled to work its way to a successful
download even under adverse circumstances. All this is done
transparently to the user and the download is much more reliable and
efficient. In contrast, a traditional HTTP redirect to a mirror
conveys only extremely minimal information - one link to one server,
and there is no provision in the HTTP protocol to handle failures.
Other features that some clients provide include multi-source
downloads, where chunks of a file are downloaded from multiple mirrors
(and optionally, Peer-to-Peer) simultaneously, which frequently
results in a faster download. Metalinks also provide structured
information about downloads that can be indexed by search engines.

http://tools.ietf.org/html/draft-bryan-metalink#section-1

I should note though that metalink requires no changes to a server. A
user can create a metalink.

> disadvantages of metalink, from our point of view, is that it is an
> overhead. This is negligible for large files, but one of our (longer
> term) use cases is for mirrors of a whole site allowing e.g. a set of
> images to be downloaded from different servers. As such, there is a
> moderate delay before a download would start since first the metalink
> must be downloaded, then decisions made, then new downloads started.

We could add this if people want. No one had requested it.

> In our case, the download starts immediately, just as in standard HTTP,
> and the client can take over the requesting of various parts when it is
> ready, so there is no delay introduced by metadata handshaking.

Downloads with metalink start immediately as well.

> Our solution is indeed designed to operate on the same URLs as  It seems
> that it is feasible for metalink to also be done transparently (by the
> client declaring "Accept: application/metalink+xml" as I understand it).

Yes, we've been experimentally using transparent content negotiation,
which we have since learned is bad. :)

We'll be using Mark's Link header in the future.

-- 
(( Anthony Bryan ... Metalink [ http://www.metalinker.org ]
  )) Easier, More Reliable, Self Healing Downloads


>> -----Original Message-----
>> From: Mark Nottingham [mailto:mnot@mnot.net]
>> Sent: 25 August 2009 07:14
>> To: Ford, Alan
>> Cc: ietf-http-wg@w3.org; Mark Handley
>> Subject: Re: Multi-server HTTP
>>
>> Alan and Mark,
>>
>> There unfortunately hasn't been much discussion of this yet, at least
>> on the list. Has there been progress elsewhere?
>>
>> For my part, this looks like interesting work. If I understand it
>> correctly, it's entirely application-layer (or at least able to be
>> implemented within the application layer), so if you want to, I think
>> it's entirely appropriate to discuss it on this list.
>>
>> Also, have you made contact with the folks doing Metalink
>> <http://www.metalinker.org/
>>  >? They have deployed implementations, and it's my understanding that
>> they're looking at revising the spec now, so it may an excellent time
>> to collaborate.
>>
>> Personally, I'd like to see the end result able to use the same URL
>> for multi-server downloads and "traditional" single-server downloads;
>> i.e., it should be transparent to clients.
>>
>> Cheers,
>>
>>
>> On 31/07/2009, at 9:59 PM, Ford, Alan wrote:
>>
>> > Hi all,
>> >
>> > At the IETF this week, Mark Handley and I submitted a
> floating-an-idea
>> > draft on multi-server HTTP and presented it in tsvarea.
>> >
>> > http://www.ietf.org/id/draft-ford-http-multi-server-00.txt
>> >
>> > Slides are at:
> http://www.ietf.org/proceedings/75/slides/tsvarea-0.pdf
>> >
>> > I realise Transport Area didn't capture a large number of HTTP
>> > people -
>> > the main reason for presenting it there was our key motivation was
> to
>> > improve Internet resource usage, and we have been doing other such
>> > work
>> > (notably multipath TCP) in that area. We were also very short on
>> > preparation time before the IETF - so apologies for missing many of
>> > you
>> > guys.
>> >
>> > However, we would very much like input and guidance from the HTTP
>> > community. I am grateful to Henrik Nordstrom for suggesting we
> should
>> > bring it to the HTTPbis WG, even though as an extension it is not
>> > within
>> > the charter.
>> >
>> > This is a brief summary of the proposal:
>> >
>> >  * We are aiming to achieve better usage of Internet resources by
>> > applying BitTorrent-like chunked downloading of large files from
>> > different servers.
>> >  * Upon connection to a Multi-Server HTTP server, when a client says
>> > they are Multi-server capable, in the response the server will
>> > provide a
>> > list of mirrors for that resource, a checksum for the file, and a
>> > chunk
>> > of the file with a Content-Range header.
>> >  * The client will then send more GET requests, this time with
> Range:
>> > headers, to the original server and to zero or more of the mirror
>> > servers, along with a verification header to ensure the checksum
>> > matches
>> > and so the resource is the same. The client will handle the
> scheduling
>> > of Range requests in order to make the most effective use of the
> least
>> > loaded servers.
>> >
>> > We realise that the draft itself is not making the best use of
>> > existing
>> > proposals. During the presentation, Instance-Digests (RFC3230) were
>> > mentioned which look ideal instead of X-Checksum, although we will
>> > still
>> > need an If-Digest-Match header. Content-MD5 was also suggested but
>> > that
>> > appears to be a checksum of just the data that is sent, not the
> whole
>> > resource.
>> >
>> > I discounted ETags along with If-Match in the proposal since RFC2616
>> > says "Entity tags are used for comparing two or more entities from
> the
>> > same requested resource" but if I have understood the terminology
>> > correctly, in our proposal we are fetching chunks from different
>> > resources (even though the content should be the same). Indeed the
> RFC
>> > also says, "The use of the same entity tag value in conjunction with
>> > entities obtained by requests on different URIs does not imply the
>> > equivalence of those entities." Please correct me if I'm wrong!
>> >
>> > There is also a question of whether we could make further
> extensions,
>> > specifically:
>> >
>> >  * Wildcarded mirror lists (e.g. a server that mirrors all
> /a/*.jpg).
>> >  * Checksums could be provided for file chunks allowing broken
> chunks
>> > to be re-fetched.
>> >  * Servers could store multiple versions of the file indexed by
>> > checksum.
>> >  * Initial servers could send no, or very little, data itself, and
>> > purely act as a load balancer; or redirect immediately when it's
>> > overloaded.
>> >
>> > These may change the mechanism quite considerably, however (e.g.
> with
>> > wildcards, no longer would you be getting all checksums from the
> same
>> > server; and for verification checksum chunks need to be
> pre-determined
>> > and calculated).
>> >
>> > We believe that the extension as it stands can bring significant
>> > benefit
>> > to HTTP, making much more efficient use of Internet resources.
>> > Experiments have been conducted that suggest it has no negative
> impact
>> > in every scenario in which it was tested.
>> >
>> > Looking forward to your comments and advice!
>> >
>> > Regards,
>> > Alan
Received on Wednesday, 26 August 2009 01:28:13 UTC