Re: caching HTTP 303 responses from Giovanni Tummarello on 2007-07-10 (semantic-web@w3.org from July 2007)

From: Giovanni Tummarello <g.tummarello@gmail.com>
Date: Tue, 10 Jul 2007 01:20:55 +0100
To: semantic-web@w3.org
CC: Jacek Kopecky <jacek.kopecky@deri.org>
Message-ID: <4692D0E7.6050105@gmail.com>
Hi Jacek,

unfortunately the "application cache" is not always possible. .
The key to cluster scalability is splitting jobs across the cluster 
nodes so each file is more or less processed per so.
Web architecture then says that if you want to go fast.. you can cache.. 
so one puts a large proxy where all the nodes in theory can feed. This 
is what we thought we'd do.. just to find out that each process was 
running a few dozen times slower than what it could (to say nothing on 
the remote hits which is the real problem) due to squid rightfully 
refusing to cache 303.
We could write a "semantic web patch" for squid to explicitly violate a  
MUST NOT.. but.. :-)
.
Giovanni


Jacek Kopecky wrote:
> Hi Eyal, 
>
> I expect that in one batch of dereferencing, you should be able to
> optimalize. Caching is for subsequent requests, and I see those as
> requests from the user, and if I understand your situation correctly,
> your app does a big batch of dereferencing in response to a single user
> request/query.
>
> You could easily do a simple queuing of URI requests in your app, and
> all the requests for foaf:name would be treated as a single request and
> done then. I expect that the situation should not change if some
> foaf:name requests come after you've actually dereferenced it once, you
> should still be able to use the same copy data you got because it was
> retrieved in the scope of the same user request.
>
> So I think what you're doing is not caching, in the same sense in which
> HTTP caching is defined. You're closer to traversing a graph, and
> foaf:name is an already visited node, to be ignored afterwards.
>
> Hope it makes sense,
> Jacek
>
> On Mon, 2007-07-09 at 20:26 +0100, Eyal Oren wrote:
>   
>> Hi,
>>
>> I've a question regarding serving RDF content using HTTP 303 redirects. For 
>> example, foaf:name [1] redirects to http://xmlns.com/foaf/spec using HTTP 
>> 303.  The, I believe relevant, RFC 2616 says that HTTP 303 responses MUST 
>> NOT be cached, although the result may be cached [2].
>>
>> Does this mean, that I have to check every single time what foaf:name 
>> redirects to? Or am I allowed to remember that foaf:name redirects to its 
>> spec? Squid for example will not cache this redirect exactly because it is 
>> a 303.  
>>
>> Unless I'm misunderstanding something, it seems that when I'm processing 
>> lots of documents by de-referencing their URIs, I must dereference 
>> foaf:name every single time, only to be redirected to the same location, 
>> after which I can use my local copy.  Requesting this HTTP header using eg.  
>> curl takes around 0.33s, which I'd think is rather a lot when processing 
>> thousands of foaf files containing tens of foaf properties each.
>>
>> My question: why are HTTP 303 codes being suggested [3],[4] instead of 
>> cacheable response such as 301 or was caching not an issue in drafting 
>> these suggestions?
>>
>>  -eyal
>>
>> [1] http://xmlns.com/foaf/0.1/name
>> [2] http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
>> [3] http://www.w3.org/TR/swbp-vocab-pub/#redirect
>> [4] http://www.dfki.uni-kl.de/~sauermann/2006/11/cooluris/
>>
>>     
>
>
>
>
Received on Tuesday, 10 July 2007 00:21:13 UTC