Re: mitigating cost of 303

On Wed, 2012-01-04 at 12:04 +0000, Steve Harris wrote:
> On 2012-01-04, at 01:59, Sandro Hawke wrote:
> 
> > On Thu, 2011-12-22 at 13:37 +0000, Steve Harris wrote:
> >> FWIW I agree with him that a 303 is a very high cost to pay.
> > 
> > In confusion or in extra round-trips?
> 
> Round trips. Doubles the number of HTTP requests, in the worst case.
> 
> If that forces you to move from a load balancer + 2 slaves, to a highend load balancer + 4 slaves for example (pretty likely) then that's a significant outlay, and additional maintenance headache.
> 
> > I have an engineering solution to the latter, which is that hosts be
> > allowed to expose (via a .well-known URI) some of the rewrite rules they
> > use.   Then, if I (as a client) find myself getting lots of redirects
> > from a host, I could look for this redirect-info file, and if it
> > appears, I can do the redirects in the client, without talking to the
> > server.   
> > 
> > This wouldn't be only for RDF, but I'd expect only people doing 303 to
> > care enough to set this up on their hosts or have their clients look for
> > it.
> > 
> > The hardest engineering part, I think, is figuring out how to encode the
> > rewrite rules.  Each server has its own fancy way of doing it.  Like
> > which version of regexps, and how to extract from the pattern space;
> > lots of solutions, but we'd need to pick one.   And, tool wise, one
> > would eventually like the web servers to automatically serve this file
> > based on the rewrite rules they are actually using.   :-)
> 
> Another place that data could be put is the XML sitemap.

Yes, that would work.  I lean slightly towards it being somewhere else
since (1) I expect it to be written by different people/code, (2) I
expect it to be consumed by different people/code, (3) I think XML is
trending down, and (4) maybe the expiration policy will be different.

> It would work if you're being crawled systematically by a small number of systems, but it doesn't help with scattered requests coming from all over the place, just means the clients are making even more requests.

Yeah.    Probably not "even more requests" if they don't ask for this
until after they've encountered several redirects, but it doesn't help
much in that case.    

My intuition is the many-small clients wont be a big part of server
load.  I'm more thinking about the client that wants to ask you about a
few billion identifiers.  There,  you both have a strong incentive to do
something smarter.  We don't even need the regexp rules for that -- we
could just have a leading-substring-subsitution.

$ GET http://id.example.com/.well-known/simple-redirects
/ http://data.example.com/page-about/

> I would have thought it would be better if the response from the potentially 303'd request was a "yes, but what you wanted was this URI, and here's the data for it". I don't know if there's a HTTP code that can express that currently, it's kindof in 203+Location: space, but not quite.

Agreed.  Something like that would be great.  Ian's proposal [1] is
pretty good, but I'm not sure it quite works.   (I don't like the fact
that browsers are still going to be showing the original URL, ... but I
think I can live with that.)   I guess some text in HTTPbis is needed to
make it really work.

That doesn't completely obviate the utility
of .well-known/simple-redirects, though.  If I'm a client who wants to
know about a billion ids, and what I want to know about them can be sent
in one single 10GB gzip'd file (or even a 10K file using rules!!), I'd
love to know that they all redirect to the same place, so I can do 2
HTTP transactions, one being very large, instead of needing to do 1B
(smaller) HTTP transactions.   

Ah, engineering daydreaming.   Much more fun than figuring out a
solution for Graph Reference.   :-)

    -- Sandro

[1]
http://blog.iandavis.com/2010/11/07/a-guide-to-publishing-linked-data-without-redirects/




> - Steve
> 

Received on Wednesday, 4 January 2012 15:10:02 UTC