Re: mitigating cost of 303 from Steve Harris on 2012-01-04 (public-rdf-wg@w3.org from January 2012)

From: Steve Harris <steve.harris@garlik.com>
Date: Wed, 4 Jan 2012 16:24:42 +0000
To: Sandro Hawke <sandro@w3.org>
Cc: "public-rdf-wg@w3.org" <public-rdf-wg@w3.org>
Message-Id: <0C193BDC-520F-47B6-92FD-9570E7F96914@garlik.com>
On 4 Jan 2012, at 15:07, Sandro Hawke wrote:

> On Wed, 2012-01-04 at 12:04 +0000, Steve Harris wrote:
>> On 2012-01-04, at 01:59, Sandro Hawke wrote:
>> 
>>> On Thu, 2011-12-22 at 13:37 +0000, Steve Harris wrote:
>>>> FWIW I agree with him that a 303 is a very high cost to pay.
>>> 
>>> In confusion or in extra round-trips?
>> 
>> Round trips. Doubles the number of HTTP requests, in the worst case.
>> 
>> If that forces you to move from a load balancer + 2 slaves, to a highend load balancer + 4 slaves for example (pretty likely) then that's a significant outlay, and additional maintenance headache.
>> 
>>> I have an engineering solution to the latter, which is that hosts be
>>> allowed to expose (via a .well-known URI) some of the rewrite rules they
>>> use.   Then, if I (as a client) find myself getting lots of redirects
>>> from a host, I could look for this redirect-info file, and if it
>>> appears, I can do the redirects in the client, without talking to the
>>> server.   
>>> 
>>> This wouldn't be only for RDF, but I'd expect only people doing 303 to
>>> care enough to set this up on their hosts or have their clients look for
>>> it.
>>> 
>>> The hardest engineering part, I think, is figuring out how to encode the
>>> rewrite rules.  Each server has its own fancy way of doing it.  Like
>>> which version of regexps, and how to extract from the pattern space;
>>> lots of solutions, but we'd need to pick one.   And, tool wise, one
>>> would eventually like the web servers to automatically serve this file
>>> based on the rewrite rules they are actually using.   :-)
>> 
>> Another place that data could be put is the XML sitemap.
> 
> Yes, that would work.  I lean slightly towards it being somewhere else
> since (1) I expect it to be written by different people/code, (2) I
> expect it to be consumed by different people/code, (3) I think XML is
> trending down, and (4) maybe the expiration policy will be different.

Yes, true.

>> It would work if you're being crawled systematically by a small number of systems, but it doesn't help with scattered requests coming from all over the place, just means the clients are making even more requests.
> 
> Yeah.    Probably not "even more requests" if they don't ask for this
> until after they've encountered several redirects, but it doesn't help
> much in that case.    
> 
> My intuition is the many-small clients wont be a big part of server
> load.  I'm more thinking about the client that wants to ask you about a
> few billion identifiers.  There,  you both have a strong incentive to do
> something smarter.  We don't even need the regexp rules for that -- we
> could just have a leading-substring-subsitution.
> 
> $ GET http://id.example.com/.well-known/simple-redirects
> / http://data.example.com/page-about/
> 
>> I would have thought it would be better if the response from the potentially 303'd request was a "yes, but what you wanted was this URI, and here's the data for it". I don't know if there's a HTTP code that can express that currently, it's kindof in 203+Location: space, but not quite.
> 
> Agreed.  Something like that would be great.  Ian's proposal [1] is
> pretty good, but I'm not sure it quite works.   (I don't like the fact
> that browsers are still going to be showing the original URL, ... but I
> think I can live with that.)   I guess some text in HTTPbis is needed to
> make it really work.

Right, even with a lot of squinting I can't justify doing something like that with the current text.

> That doesn't completely obviate the utility
> of .well-known/simple-redirects, though.  If I'm a client who wants to
> know about a billion ids, and what I want to know about them can be sent
> in one single 10GB gzip'd file (or even a 10K file using rules!!), I'd
> love to know that they all redirect to the same place, so I can do 2
> HTTP transactions, one being very large, instead of needing to do 1B
> (smaller) HTTP transactions.   
> 
> Ah, engineering daydreaming.   Much more fun than figuring out a
> solution for Graph Reference.   :-)

+1!

- Steve

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Wednesday, 4 January 2012 16:27:46 UTC