- From: Steve Harris <steve.harris@garlik.com>
- Date: Wed, 4 Jan 2012 16:24:42 +0000
- To: Sandro Hawke <sandro@w3.org>
- Cc: "public-rdf-wg@w3.org" <public-rdf-wg@w3.org>
On 4 Jan 2012, at 15:07, Sandro Hawke wrote: > On Wed, 2012-01-04 at 12:04 +0000, Steve Harris wrote: >> On 2012-01-04, at 01:59, Sandro Hawke wrote: >> >>> On Thu, 2011-12-22 at 13:37 +0000, Steve Harris wrote: >>>> FWIW I agree with him that a 303 is a very high cost to pay. >>> >>> In confusion or in extra round-trips? >> >> Round trips. Doubles the number of HTTP requests, in the worst case. >> >> If that forces you to move from a load balancer + 2 slaves, to a highend load balancer + 4 slaves for example (pretty likely) then that's a significant outlay, and additional maintenance headache. >> >>> I have an engineering solution to the latter, which is that hosts be >>> allowed to expose (via a .well-known URI) some of the rewrite rules they >>> use. Then, if I (as a client) find myself getting lots of redirects >>> from a host, I could look for this redirect-info file, and if it >>> appears, I can do the redirects in the client, without talking to the >>> server. >>> >>> This wouldn't be only for RDF, but I'd expect only people doing 303 to >>> care enough to set this up on their hosts or have their clients look for >>> it. >>> >>> The hardest engineering part, I think, is figuring out how to encode the >>> rewrite rules. Each server has its own fancy way of doing it. Like >>> which version of regexps, and how to extract from the pattern space; >>> lots of solutions, but we'd need to pick one. And, tool wise, one >>> would eventually like the web servers to automatically serve this file >>> based on the rewrite rules they are actually using. :-) >> >> Another place that data could be put is the XML sitemap. > > Yes, that would work. I lean slightly towards it being somewhere else > since (1) I expect it to be written by different people/code, (2) I > expect it to be consumed by different people/code, (3) I think XML is > trending down, and (4) maybe the expiration policy will be different. Yes, true. >> It would work if you're being crawled systematically by a small number of systems, but it doesn't help with scattered requests coming from all over the place, just means the clients are making even more requests. > > Yeah. Probably not "even more requests" if they don't ask for this > until after they've encountered several redirects, but it doesn't help > much in that case. > > My intuition is the many-small clients wont be a big part of server > load. I'm more thinking about the client that wants to ask you about a > few billion identifiers. There, you both have a strong incentive to do > something smarter. We don't even need the regexp rules for that -- we > could just have a leading-substring-subsitution. > > $ GET http://id.example.com/.well-known/simple-redirects > / http://data.example.com/page-about/ > >> I would have thought it would be better if the response from the potentially 303'd request was a "yes, but what you wanted was this URI, and here's the data for it". I don't know if there's a HTTP code that can express that currently, it's kindof in 203+Location: space, but not quite. > > Agreed. Something like that would be great. Ian's proposal [1] is > pretty good, but I'm not sure it quite works. (I don't like the fact > that browsers are still going to be showing the original URL, ... but I > think I can live with that.) I guess some text in HTTPbis is needed to > make it really work. Right, even with a lot of squinting I can't justify doing something like that with the current text. > That doesn't completely obviate the utility > of .well-known/simple-redirects, though. If I'm a client who wants to > know about a billion ids, and what I want to know about them can be sent > in one single 10GB gzip'd file (or even a 10K file using rules!!), I'd > love to know that they all redirect to the same place, so I can do 2 > HTTP transactions, one being very large, instead of needing to do 1B > (smaller) HTTP transactions. > > Ah, engineering daydreaming. Much more fun than figuring out a > solution for Graph Reference. :-) +1! - Steve -- Steve Harris, CTO, Garlik Limited 1-3 Halford Road, Richmond, TW10 6AW, UK +44 20 8439 8203 http://www.garlik.com/ Registered in England and Wales 535 7233 VAT # 849 0517 11 Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Wednesday, 4 January 2012 16:27:46 UTC