Re: Data Driven Discussions about httpRange-14, etc (was: Re: Change Proposal for HttpRange-14)

Hi Tom,

On 30 Mar 2012, at 17:22, Tom Heath wrote:
> On 27 March 2012 18:54, Jeni Tennison <jeni@jenitennison.com> wrote:
>>> 2) hard data about the 303 redirect penalty, from a consumer and
>>> publisher side. Lots of claims get made about this but I've never seen
>>> hard evidence of the cost of this; it may be trivial, we don't know in
>>> any reliable way. I've been considering writing a paper on this for
>>> the ISWC2012 Experiments and Evaluation track, but am short on spare
>>> time. If anyone wants to join me please shout.
>> 
>> I could offer you a data point from legislation.gov.uk if you like.
> 
> Woohoo! You've made my decade :D
> 
>> When someone requests the ToC for an item of
>> legislation, they will usually hit our CDN and the result will come back extremely quickly. I just tried:
>> 
>> curl --trace-time -v http://www.legislation.gov.uk/ukpga/1985/67/contents
>> 
>> and it showed the result coming back in 59ms.
>> 
>> When someone uses the identifier URI for the abstract concept of an item of legislation, there's no caching so the > request goes right back to the server. I just tried:
>> 
>> curl --trace-time -v http://www.legislation.gov.uk/id/ukpga/1985/67
>> 
>> and it showed the result coming back in 838ms, of course the redirection goes to the ToC above, so in total it
>> takes around 900ms to get back the data.
> 
> Brilliant. This is just the kind of analysis I'm talking about. Now we
> need to do similar across a bunch of services, connection speeds,
> locations, etc., and then compare it to typical response times across
> a representative sample of web sites. We use New Relic for this kind
> of thing, and the results are rather illuminating. <1ms response times
> makes you rather special IIRC. That's not to excuse sluggish sites,
> but just to put this in context.
> 
>> So every time that we refer to an item of legislation through its generic identifier rather than a direct link to its ToC > we are making the site seem about 15 times slower.
> 
> So now we're getting down to the crux of the question: does this
> outcome really matter?! 15x almost nothing is still almost nothing!
> 15x slower may offend our geek sensibilities, but probably doesn't
> matter in practice when the absolute numbers are so small.
> 
> To give another example, I just did some very ad-hoc tests on some
> URIs at a department of a well-known UK university, and the results
> were rather revealing! The total response time (get URI of NIR,
> receive 303 response, get URI of IR, receive 200 OK and resource
> representation back) took ~10s, of which ***over 90%*** was taken up
> by waiting for the page/IR about that NIR to be generated! (and that's
> with curl, not a browser, which may then pull in a bunch of external
> dependencies). In this kind of situation I think there are other,
> bigger issues to worry about than the <1s taken for a 303-based
> roundtrip!!

Just to put this into context for you so that you understand why it's a big deal. We have a contract [1] (well, actually three contracts) that specifies the time that it takes on average for a typical table of contents or section to be retrieved as less than one second. In the England/Wales contract [2], it's Clauses 12-13 of Section 6.8 of Schedule 1, on page 125 if you want to take a look. The contract includes financial penalties when these targets aren't reached.

It's not easy to reach these targets with the kind of complex content we're dealing with. The only way we have a hope is through caching the hell out of the site, delivering it through a CDN.

Now we could quibble over how exactly you measure the length of time for retrieving a section or table of contents, but it's really clear that what the customer (TNA) wants is a performant website that doesn't suffer from the noticeable delay when loading a page that you get when a page takes more than a second to come through [3]. If we had 303 hops, they would definitely be complaining (remember the 900ms doesn't include downloading CSS and Javascript, which add delays), and it could cost TSO money.

I'm absolutely prepared to believe that there are sites out there that don't have these limitations: I don't really care if it takes more than a second for pages on my own website to get returned, for example. But for large-scale websites like legislation.gov.uk, delivered under contracts that have penalty clauses for poor performance, yes it really really does matter that it's 60ms rather than 900ms.

Cheers,

Jeni

[1] http://www.contractsfinder.businesslink.gov.uk/Common/View%20Notice.aspx?site=1000&lang=en&noticeid=272362&fs=true
[2] http://www.contractsfinder.businesslink.gov.uk/~/docs/DocumentDownloadHandler.ashx?noticeDocumentId=18140&fileId=b826ad80-f316-493a-a86d-23546ceb95e2
[3] http://www.useit.com/papers/responsetime.html
-- 
Jeni Tennison
http://www.jenitennison.com

Received on Friday, 30 March 2012 19:31:21 UTC