RE: multi-host virtual sites for HTTP 1.2 from Paul Leach on 1996-07-10 (ietf-http-wg@w3.org from July to September 1996)

From: Paul Leach <paulle@microsoft.com>
Date: Wed, 10 Jul 1996 16:34:59 -0700
To: "'Roy T. Fielding'" <fielding@liege.ICS.UCI.EDU>
Cc: "'http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com'" <http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com>
Message-Id: <c=US%a=_%p=msft%l=RED-77-MSG-960710233459Z-22887@tide21.microsoft.com>

>----------
>From: 	Roy T. Fielding[SMTP:fielding@liege.ICS.UCI.EDU]
>Sent: 	Tuesday, July 09, 1996 9:06 PM
>To: 	Paul Leach
>Cc: 	http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
>Subject: 	Re: multi-host virtual sites for HTTP 1.2 
>
>> What I'd like to see in HTTP/1.2 is the ability to easily and
>> efficiently make multiple hosts act like a single virtual Web site, so
>> that one can build huge web sites.
>
>Ummm, assuming it is decent server software, the only real limitation
>on site size right now is the bandwidth of the incoming network.

True enough today, but I think it has to be considered short sighted
given the exponential growth rate of the net and the even larger growth
of "Intranets" (organization nets converting to use Internet protocols).

We have a T3 connection to the net -- 45 megabits/sec. It isn't
saturated, but it won't be too long as the net grows exponentially. And
then we'll get an OC3... 
We can put multiple A records in for www.microsoft.com, and that will
work as long as all the pages under www.microsoft.com fit conveniently
on one server. At some point it becomes 
than one server can handle; then we put in the gateway like you propose
below.

But this is just our public info. What about all the internal pages?
They're about 2 orders of magnitude larger in number. I can't put them
on one server's disks, even if I replicate the server to handle the
load, and I don't want to funnel all my internal accesses through
gateways, any more than I would want to funnel all my network file
system traffic though gateways. And as I distribute and re-distribute
them across multiple servers' disks, I don't want to have to change
their names -- some of them are well known, and the rest still require
fixing all the links that point at them.

It is nearly axiomatic in distributed systems design that in order to
really scale, you have to be able to replicate and partition and cache.
The proposal was a small addition to HTTP to allow one to partition
transparently to the names chosen (within a site) without the cost of
302s for each and every one.

>Splitting requests onto different server machines can be easily
>accomplished using a very fast gateway box which sits on the trunk
>and routes requests to the internal servers for processing.  The only
>thing the gateway would need to do is look for the beginning of each
>message and multiplex from there (which is more difficult than
>IP routing, but not much more).

It needs to do more than just look at the front of each message -- it
also needs to forward the request and relay the response. At full load,
>it would need to forward 90 megabits/sec to handle our _current_ site. Any
single point will eventually become a bottleneck as the load increases.
At some point, I have to replicate the gateway to handle the load.
Software that implements a protocol to avoid all that is lots cheaper.
Plus it lets me move pages within my site with changing all the links to
>the pages.
>
>I think that would be a lot easier than changing HTTP to accomodate
>complex URL rewriting rules.

They're not that complex -- just substitute one prefix for another. But
OK, it would be better if no rewriting rules were needed. You've
prompted me to think of a slightly different scheme where they aren't
needed: the referral header just has a prefix and a new hostname to use
for all Request-URIs that start with that prefix; the Request-URI and
the Host sent to that new hostname are left unchanged from the original
request. The servers are configured to know what subtree(s) of the name
space they own, and look up in their local file system relative to that.

>  There are good reasons to support a
>URI rewriting mechanism on the client side (e.g., to make the URI
>syntax truly extensible, support automated mirroring, support URNs,
>etc.), but I haven't considered single-site performance to be one
>of them.

All of the arguments in your message apply equally well to the DNS --
why does it have referrals?
It could have just used CNAME -- that's the analogy to the way HTTP uses
of links to glue together the distributed servers.

Ditto for AFS and X.500.

Paul

Received on Wednesday, 10 July 1996 16:41:27 UTC