RE: Multi-server HTTP

On Fri, 28 Aug 2009, Ford, Alan wrote:

> the client could infer capability by getting a Mirrors: header back from a 
> HEAD request first, and then deciding what to do (assuming the connection 
> can be kept alive).

That would work even if the connection isn't kept alive, wouldn't it?

> Which brings me onto another thing about Mirrors: header. One of our 
> longer-term goals with this would be to somehow provide wildcarded lists of 
> mirrors, so that a client could immediately run off and fetch bits of a 
> website from many mirrors, potentially speeding up loading time 
> considerably, and providing an alternative method of load balancing.
>
> However, I'm struggling to see a neat way of doing this reliably, since we 
> couldn't get checksums for every file on the first handshake (or if all 
> content was static we might be able to, but it's a big overhead). Does 
> anybody have any ideas as to a neat way of doing this? Best I can think of 
> so far is some sort of version number/(pseudo)hash of the entire directory 
> structure!

This idea is attractive methinks, but coming up with a fine protocol for it is 
really tricky.

A hash of the entire directory would be problematic, I think, since it would 
imply that both directory structures need to remain identical - not only hold 
the right files and no extra files.

I'm thinking like: you have two sites A and B, they show one picture each 
A.jpg and B.jpg. Both sites refer to a mirror that holds BOTH those images in 
the same directory. It could work fine, but the mirror's dir doesn't look the 
same as the dir of A nor B. That concept would break too easily I think.

We want to avoid doing requests to non-existing resources on the mirror that'd 
respond with a 404 back (which then would have to retried to the master site 
or another mirror) - we need a decent way for a client to know which URIs it 
can try to get from a mirror instead of the master...

I think all this make me favour not a wildcard concept, but more a 
list-concept where a site can list not only that "this object also exist HERE 
and HERE" but then also "THESE OTHER OBJECTS also exist HERE and HERE" and 
"THESE OTHER" would then be a list of (relative?) URIs somehow. But this 
becomes awkward if the list of items is long.

Then we come to the concept of changing items. How long can a client assume 
that the mirrors have the corresponding object? Would they need some kind of 
cache control headers to specify that? In the mirror-for-a-single-object case 
I think we can assume that the mirror will have the object for at least a very 
short while after the response said so but then it too gets this problem.

-- 

  / daniel.haxx.se

Received on Friday, 28 August 2009 13:46:54 UTC