Hi

My only comment, which springs to mind when I see wording about downloading files in parts.

Proxies that scan for malware.

In order to allow scanning, the entire entity is required. Therefore range requests must be modified by the proxy.

Typically this is done by stripping the Range header in the upstream request, and possibly stripping Accept-Ranges from server responses.

So, any system proposed that is going to use Range requests, is going to run into problems with proxies that perform AV functions.

Even if the proxy still sends just the requested parts to the client after downloading the whole thing and scanning it, which may satisfy the client in terms of response, but will not satisfy it in terms of responsiveness, retrieving parts from multiple servers will have a hugely negative effect, since the file will cause the download the entire file from each server that it thinks it's only getting a part of the file from.

This is therefore only ever going to become an increasing problem, since network administrators aren't going to be abandoning antivirus any time soon.

I don't know what the best solution to this is. Unless the proxy knew that the parts were of the same file, and could piece it together itself to scan it, it can't save that resource use.

And there's no way around getting the whole file first if you want to use heuristic scanning for malware at the gateway. This creates other problems which I attempted to address in an Internet Draft a while back (draft-decroy-http-progress).

Regards

Adrien

Anthony Bryan wrote:

On Mon, Jul 27, 2009 at 11:03 AM, Henrik
Nordstrom<henrik@henriknordstrom.net> wrote:

This draft made a bit of surprise appearance in the transport area
meeting today:

http://tools.ietf.org/html/draft-ford-http-multi-server

My initial reaction is lots of obvious overlap with other work and
misunderstandings of basic HTTP functions like ETag.

Basic motivation behind the work may be reasonable however.

I will try to catch the author for a more in-depth discussion shortly.

Other opinions?


Very interesting, thanks for writing about this Henrik. I hadn't seen
or heard of it.

For those unfamiliar with Metalink, we offer solutions to the same
problems (and more) in an XML format, as opposed to HTTP extensions.

So I'm interested in what people think about it, criticism, ideas, etc
because it may allow us to improve what we are doing. We're also
seeking review for our Internet Draft at
http://tools.ietf.org/html/draft-bryan-metalink

If anyone is interested in trying Metalink out, a good amount of
software is available, in the form of download managers (most popular
ones), Firefox extensions, command line clients, and browser.
While many Metalink clients, especially download managers, download
simultaneously from multiple mirrors, it's really about giving
alternate locations for a download to complete (if a server goes down)
and also repairing downloads. Information about mirrors like location
and priority can also be included.

Projects like cURL, OpenOffice.org, and most Linux distributions use
Metalinks for downloads, especially for large files.

More info here:

http://www.metalinker.org/
http://en.wikipedia.org/wiki/Metalink

Here's the intro from draft-ford-http-multi-server-00:

"1. Introduction and Motivation


   Mirrored HTTP servers are regularly used for software downloads,
   whereby copies of data to be downloaded are duplicated on many
   servers distributed around the Internet.  Users are encouraged to
   manually choose a nearby mirror from which to download.  This is
   intended to increase both throughput and resilience, and reduce load
   on individual servers.  Manual mirror choice rarely works well; users
   do not wish to make a choice, but if they are not forced to, then the
   default server takes a disproportionate share of the load.  Even when
   they are forced to choose, they rarely have enough information to
   choose the server that will provide the best performance.

   Some popular sites automate this process using DNS load balancing,
   both to approximately balance load between servers, and to direct
   clients to nearby servers with the hope that this improves
   throughput.  Indeed, DNS load balancing can balance long-term server
   load fairly effectively, but it is less effective at delivering the
   best throughput to users when the bottleneck is not the server but
   the network.

   This document specifies an alternative mechanism by which the benefit
   of mirrors can be automatically and more efficiently realised.  These
   benefits are achieved using a number of extensions to HTTP which
   allow the discovery of mirrors, the verification of the integrity of
   files on each mirror, and the simultaneous downloading of chunks from
   multiple mirrors.  The use of this mechanism allows greater
   efficiency in resource utilisation in the Internet as a whole,
   balances server utilization, even on short timescales, and enhances
   user experience through faster downloads."

-- 
Adrien de Croy - WinGate Proxy Server - http://www.wingate.com