RE: caching HTTP 303 responses from Williams, Stuart (HP Labs, Bristol) on 2007-07-12 (semantic-web@w3.org from July 2007)

From: Williams, Stuart (HP Labs, Bristol) <skw@hp.com>
Date: Thu, 12 Jul 2007 11:59:50 +0100
To: "Jeremy Carroll" <jjc@hpl.hp.com>, "Jacek Kopecky" <jacek.kopecky@deri.org>
Cc: "Giovanni Tummarello" <g.tummarello@gmail.com>, <semantic-web@w3.org>
Message-ID: <C4B3FB61F7970A4391A5C10BAA1C3F0DBB34F6@sdcexc04.emea.cpqcorp.net>

Ok... so I'll offer a thought...

At least is some of the problem cases, the retieval URIs are the URIs of
properties and classes in OWL ontologies and RDFS vocabulary
descriptions. Presumably the motivation to perform such retrievals is a
lack of knowledge about the thing referred to by the URI. A successful
retrieval, whether it arises from a protocol redirect or a client side
redirection through the stripping of a fragID, renders the requesting
agent *informed* about the referrent. The question is surely the
persistence of that information rather than the persistence of
redirection. 

So... does your agent already know the answer to a question that its
about to ask?

1) Do I need to ask this question or do I know enough about this thing
already - chances are the answer is already squirrelled away in the
agents knowledge base.

2) Some answer tell you about more things than you asked about eg.
Dublic Core - because a bunch of URIs for dc properties all redirect to
the same description of all of them (modulo a spurious fragId last time
I looked, that gets stripped anyway - and for which there is no referent
in the resulting description). So you may already be informed about
things that you haven't asked about.

The imperative for the agent to ask a question seems to be lack of
knowledge of the answer. If it already has the answer... you can avoid
asking the question.

My 2 cents.

Stuart Williams
--
Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks
RG12 1HN
Registered No: 690597 England

> -----Original Message-----
> From: semantic-web-request@w3.org 
> [mailto:semantic-web-request@w3.org] On Behalf Of Jeremy Carroll
> Sent: 10 July 2007 13:45
> To: Jacek Kopecky
> Cc: Giovanni Tummarello; semantic-web@w3.org
> Subject: Re: caching HTTP 303 responses
> 
> 
> 
> Is there some motivation for the MUST NOT cache constraint?
> 
> A thought is that there are quite complex HTTP cache control 
> mechanisms which may not work correctly. But I suppose 302s 
> are cached, and can be updated, and the behaviour is 
> acceptable.... so that the same mechanisms should work with 
> 303 (except for the prohibition).
> 
> ....
> 
> thinking out loud, without reading the specs,
> 
> Jeremy
> 
> 
> 
> Jacek Kopecky wrote:
> > Hi Giovanni,
> > 
> > barring the change away from 303 for non-information resources, or a

> > change to the cacheability of 303, one could indeed make a patch for

> > squid.
> > 
> > The way I'd go about it, not to break too much, would be to add a 
> > request ID header which would differ for different user requests,
and 
> > the squid would cache everything within the same request ID, and it 
> > would follow the specs for different requests.
> > 
> > The request ID would be treated as enabler for these "atomically 
> > cacheable" things (everything), atomically as in "in the same user 
> > request processing". And this could mean statefulness in squid
(prolly 
> > a very bad thing) if there was a requirement to interleave the 
> > processing of multiple user requests.
> > 
> > But thinking about this, fixing 303 cacheability or maybe adding a 
> > cacheable 308 Description Elsewhere sounds easier now. 8-)
> > 
> > Jacek
> > 
> > On Tue, 2007-07-10 at 01:20 +0100, Giovanni Tummarello wrote:
> >> Hi Jacek,
> >>
> >> unfortunately the "application cache" is not always possible. .
> >> The key to cluster scalability is splitting jobs across the cluster

> >> nodes so each file is more or less processed per so.
> >> Web architecture then says that if you want to go fast.. you can
cache.. 
> >> so one puts a large proxy where all the nodes in theory can feed. 
> >> This is what we thought we'd do.. just to find out that each
process 
> >> was running a few dozen times slower than what it could (to say 
> >> nothing on the remote hits which is the real problem) due to squid 
> >> rightfully refusing to cache 303.
> >> We could write a "semantic web patch" for squid to explicitly
violate 
> >> a MUST NOT.. but.. :-) .
> >> Giovanni
> >>
> > 
> > 
> > 
> 
> --
> Hewlett-Packard Limited
> registered Office: Cain Road, Bracknell, Berks RG12 1HN 
> Registered No: 690597 England

Received on Thursday, 12 July 2007 11:01:37 UTC