Multi-server HTTP from Ford, Alan on 2009-07-31 (ietf-http-wg@w3.org from July to September 2009)

From: Ford, Alan <alan.ford@roke.co.uk>
Date: Fri, 31 Jul 2009 12:59:33 +0100
To: <ietf-http-wg@w3.org>
Cc: "Mark Handley" <m.handley@cs.ucl.ac.uk>
Message-ID: <2181C5F19DD0254692452BFF3EAF1D680830AAB5@rsys005a.comm.ad.roke.co.uk>
Hi all,

At the IETF this week, Mark Handley and I submitted a floating-an-idea
draft on multi-server HTTP and presented it in tsvarea.

http://www.ietf.org/id/draft-ford-http-multi-server-00.txt

Slides are at: http://www.ietf.org/proceedings/75/slides/tsvarea-0.pdf

I realise Transport Area didn't capture a large number of HTTP people -
the main reason for presenting it there was our key motivation was to
improve Internet resource usage, and we have been doing other such work
(notably multipath TCP) in that area. We were also very short on
preparation time before the IETF - so apologies for missing many of you
guys.

However, we would very much like input and guidance from the HTTP
community. I am grateful to Henrik Nordstrom for suggesting we should
bring it to the HTTPbis WG, even though as an extension it is not within
the charter.

This is a brief summary of the proposal:

  * We are aiming to achieve better usage of Internet resources by
applying BitTorrent-like chunked downloading of large files from
different servers.
  * Upon connection to a Multi-Server HTTP server, when a client says
they are Multi-server capable, in the response the server will provide a
list of mirrors for that resource, a checksum for the file, and a chunk
of the file with a Content-Range header.
  * The client will then send more GET requests, this time with Range:
headers, to the original server and to zero or more of the mirror
servers, along with a verification header to ensure the checksum matches
and so the resource is the same. The client will handle the scheduling
of Range requests in order to make the most effective use of the least
loaded servers.

We realise that the draft itself is not making the best use of existing
proposals. During the presentation, Instance-Digests (RFC3230) were
mentioned which look ideal instead of X-Checksum, although we will still
need an If-Digest-Match header. Content-MD5 was also suggested but that
appears to be a checksum of just the data that is sent, not the whole
resource.

I discounted ETags along with If-Match in the proposal since RFC2616
says "Entity tags are used for comparing two or more entities from the
same requested resource" but if I have understood the terminology
correctly, in our proposal we are fetching chunks from different
resources (even though the content should be the same). Indeed the RFC
also says, "The use of the same entity tag value in conjunction with
entities obtained by requests on different URIs does not imply the
equivalence of those entities." Please correct me if I'm wrong!

There is also a question of whether we could make further extensions,
specifically:

  * Wildcarded mirror lists (e.g. a server that mirrors all /a/*.jpg).
  * Checksums could be provided for file chunks allowing broken chunks
to be re-fetched.
  * Servers could store multiple versions of the file indexed by
checksum.
  * Initial servers could send no, or very little, data itself, and
purely act as a load balancer; or redirect immediately when it's
overloaded.

These may change the mechanism quite considerably, however (e.g. with
wildcards, no longer would you be getting all checksums from the same
server; and for verification checksum chunks need to be pre-determined
and calculated).

We believe that the extension as it stands can bring significant benefit
to HTTP, making much more efficient use of Internet resources.
Experiments have been conducted that suggest it has no negative impact
in every scenario in which it was tested.

Looking forward to your comments and advice!

Regards,
Alan

------------------------------------------------------------------------
Alan Ford

Tel:	+44 (0)1794 833465
Fax:	+44 (0)1794 833433
alan.ford@roke.co.uk


-- 
Roke Manor Research Ltd, Romsey,
Hampshire, SO51 0ZN, United Kingdom

A Siemens company
Registered in England & Wales at:
Siemens plc, Faraday House, Sir William Siemens Square,
Frimley, Camberley, GU16 8QD. Registered No: 267550
------------------------------------------------------------------------
Visit our website at www.roke.co.uk
------------------------------------------------------------------------
The information contained in this e-mail and any attachments is
proprietary to Roke Manor Research Ltd and must not be passed to any
third party without permission. This communication is for information
only and shall not create or change any contractual relationship.
------------------------------------------------------------------------

Please consider the environment before printing this email
Received on Friday, 31 July 2009 12:00:23 UTC