- From: Jack Bates <jzej8k@nottheoilrig.com>
- Date: Sat, 19 May 2012 00:52:38 -0700
- To: ietf-http-wg@w3.org
- CC: Anthony Bryan <anthonybryan@gmail.com>, Leif Hedstrom <zwoop@apache.org>
Hello, I am curious to know the current thinking on HTTP forward proxies and content distribution networks, or download mirrors. What techniques are used to help forward proxies and content distribution networks play well together? What facilities are available in the HTTP protocol for this? What resources are available from the broader community of standards and best practices? The approach that I am currently pursuing is to use RFC 6249, Metalink/HTTP: Mirrors and Hashes. For those content distribution networks that support it, our forward proxy listens for responses that are an HTTP redirect and have "Link: <...>; rel=duplicate" headers. If the URL in the "Location: ..." header is not already cached then we scan "Link: <...>; rel=duplicate" headers for a URL that is already cached and if found, we rewrite the "Location: ..." header with this URL I would be very grateful for any feedback on this approach. What are the problems with this strategy? What are the alternatives? How does it relate to the letter or spirit of web architecture? We are also thinking of using RFC 3230, Instance Digests in HTTP. Our proxy would listen for HTTP redirect responses that had "Digest: ..." headers. If the URL in the "Location: ..." header were not already cached then we would check if other content with the same digest were already cached. If so then we would rewrite the "Location: ..." header with the corresponding URL The issue of forward proxies and content distribution networks is important to us because we run a caching proxy here at a rural village in Rwanda. Many web sites that distribute files present users with a simple download button that redirects to a download mirror, but they do not predictably redirect to the same mirror, or to a mirror that we already cached, so users can't predict whether a download will take seconds or hours, which is frustrating Here is a proof of concept plugin [1] for the Apache Traffic Server open source caching proxy. It works just enough that given a response with a "Location: ..." header that is not already cached and a "Link: <...>; rel=duplicate" header that is already cached, it will replace the URL in the "Location: ..." header with the cached URL I am working on this as part of the Google Summer of Code [1] https://github.com/jablko/dedup
Received on Saturday, 19 May 2012 07:49:12 UTC