Re: Workload for surrogates

From: Mark Nottingham (mnot@akamai.com)
Date: Mon, Aug 21 2000

  • Next message: max sutter: "Pet Tag Program For shelters and Rescue groups."

    Date: Mon, 21 Aug 2000 14:16:10 -0700
    From: Mark Nottingham <mnot@akamai.com>
    To: hardie@equinix.com
    Cc: Fred Douglis <douglis@research.att.com>, surrogates@equinix.com, www-wca@w3.org
    Message-ID: <20000821141607.B5976@akamai.com>
    Subject: Re: Workload for surrogates
    
    
    Sorry, I meant characterizing the workload, not the CDNs themselves.
    
    I'd say that a typical workload for a CDN that cherry-picks objects (only
    route cacheable objects) is very high, as they're all cacheable. By very
    high, I mean >90%.
    
    However, this will vary signficantly, depending on how cacheable those
    objects are, what the distributions of surrogates in the CDN network is
    like, locality of the clients, etc. 
    
    Also, CDNs may or may not route all objects for a site through them; this
    will change the hit rate quite a bit. I don't think it's valid to assume
    that they will only handle cacheable objects.
    
    OTOH, a good starting point would be to figure out what the hit rate for a
    normal (non-CDN) surrogate typically is. This would make a useful baseline
    for CDNs, and is necessary regardless.
    
    It *should* be possible to get some surrogate ("reverse proxy", grr) traces
    to do this with, or at least take a peek at the hit rate that many of these
    devices will calculate. Alternatively, it may be possible to derive a
    theoretical hit rate (as well as the other things you're interested in) from
    normal Web server logs, if the cacheability of the objects served is still
    available.
    
    
    
    On Mon, Aug 21, 2000 at 01:47:05PM -0700, hardie@equinix.com wrote:
    > Mark writes:
    > >  Characterizing CDNs is shooting at a moving target; each one is
    > > going to take a different tack at how it handles objects, and what
    > > gets routed through the CDN. so-called "edge processing" throws
    > > another wrench into the works, as each one will have a different
    > > approach.
    > 
    > I agree, but my question isn't so much about characterizing CDNs,
    > but characterizing the workload of CDNs.  For example, most CDNs
    > are limited to cachable objects (certainly all surrogate system
    > CDNs are limited to cachable objects).  Given that, what are
    > hit rates against the CDNs' cache like?  Is a 40% hit rate typical,
    > as it would be for a Squid cache in the wild, or is 90% more
    > typical?  What are the distributions of things like number of
    > hits per object, time in cache, etc?
    > 
    > I don't want anyone to give away their secret sauce on this stuff,
    > such as describing the exact algorithms for cache filling, but
    > some general sense of what the workload edges are like would
    > be a big help.
    > 				regards,
    > 					Ted Hardie
    > 
    > 
    > 
    > 
    > > 
    > > Of course, all surrogates are not used in CDNs; "reverse proxies" are
    > > somewhat widely deployed on popuular sites (that's a feeling; I haven't seen
    > > much data to support it).
    > > 
    > > I'd think two workloads would be in order; one to represent a surrogate in
    > > front of an entire, "typical" site, and one to represent a CDN that handles
    > > all cacheable objects. 
    > > 
    > > These would only give rough figures, of course, but that's about the best
    > > that can be done IMHO, and they would still be useful to compare.
    > > 
    > > 
    > > On Thu, Aug 17, 2000 at 09:33:41AM -0400, Fred Douglis wrote:
    > > > [Cross-posted to wca list from surrogates list.  Original message attached.]
    > > > 
    > > > Ted,
    > > > 
    > > > Regarding CDNs, since in general there is a tendency toward emphasizing 
    > > > cachable content up front, I would expect the workload to be somewhat 
    > > > different from a typical origin server, at least one that has any significant 
    > > > dynamic data.  There'll be a greater fraction of hits to more static content 
    > > > such as gifs.  
    > > > 
    > > > I know there's been some work on benchmarking CDNs but I don't know if there's 
    > > > been work on characterizing behavior.  The W3C workload characterization group 
    > > > might be looking into this -- have you asked around there?  I'm cc-ing them 
    > > > here.  Perhaps someone will have additional info.
    > > > 
    > > > Regards,
    > > > 
    > > > Fred
    > > 
    > > Content-Description: 4
    > > > Date: Wed, 16 Aug 2000 14:00:08 -0700 (PDT)
    > > > From: hardie@equinix.com
    > > > Reply-To: hardie@equinix.com
    > > > To: surrogates@equinix.com
    > > > Cc: hardie@equinix.com (ted hardie)
    > > > Subject: Workload for surrogates
    > > > Delivery-Date: Wed Aug 16 17:15 EDT 200
    > > > X-Mailer: ELM [version 2.5 PL3]
    > > > 
    > > > As part of the testing of the Bellwether surrogates implementation,
    > > > Duane and I have been talking about what the workload for a demand
    > > > driven surrogate would look like.  We've been testing both single
    > > > surrogate and load balanced surrogates using a polygraph workload that
    > > > presumes a fairly high hit rate.  This derives from our model of a
    > > > surrogate that gets invoked by periods of high demand for a limited
    > > > data set (the flash crowd/CNN event effect typical for a Starr report
    > > > release).
    > > > 
    > > > Thinking about this, though, I've been wondering what other types of
    > > > workloads might be common.  CDNs seem likely to have a workload
    > > > close to an origin server, where some demand-driven surrogates might
    > > > have a workload like a proxy cache.  
    > > > 
    > > > Any insight out there on what you expect or what you have seen in
    > > > initial deployments?
    > > > 			regards,
    > > > 				Ted Hardie
    > > > 				Equinix, Inc.
    > > > 
    > > 
    > > 
    > > -- 
    > > Mark Nottingham, Research Scientist
    > > Akamai Technologies (San Mateo, CA)
    > > 
    
    -- 
    Mark Nottingham, Research Scientist
    Akamai Technologies (San Mateo, CA)