RE: caching HTTP 303 responses from Jacek Kopecky on 2007-07-12 (semantic-web@w3.org from July 2007)

From: Jacek Kopecky <jacek.kopecky@deri.org>
Date: Thu, 12 Jul 2007 13:13:46 +0200
To: "Williams, Stuart (HP Labs, Bristol)" <skw@hp.com>
Cc: Jeremy Carroll <jjc@hpl.hp.com>, Giovanni Tummarello <g.tummarello@gmail.com>, semantic-web@w3.org
Message-Id: <1184238826.4254.123.camel@localhost>
Stuart, 

I was thinking along the same lines, but decided to use the caching
terminology instead because it's hard to define when an agent knows
*enough* about something.

For instance the DC case - they have different URIs, and initially the
agent doesn't know anything about them. It dereferences dc:description,
let's say, and finds some information about that and other DC
properties. It can probably assume now that it need never again (for
small values of never) dereference that. But when it encounters
dc:title, how can the client know that the stuff it got from the
dc:description redirect is all the pertinent information that it can get
from dc:title, which it doesn't yet know to redirect anywhere?

Best regards,
Jacek

On Thu, 2007-07-12 at 11:59 +0100, Williams, Stuart (HP Labs, Bristol)
wrote:
> 
> Ok... so I'll offer a thought...
> 
> At least is some of the problem cases, the retieval URIs are the URIs of
> properties and classes in OWL ontologies and RDFS vocabulary
> descriptions. Presumably the motivation to perform such retrievals is a
> lack of knowledge about the thing referred to by the URI. A successful
> retrieval, whether it arises from a protocol redirect or a client side
> redirection through the stripping of a fragID, renders the requesting
> agent *informed* about the referrent. The question is surely the
> persistence of that information rather than the persistence of
> redirection. 
> 
> So... does your agent already know the answer to a question that its
> about to ask?
> 
> 1) Do I need to ask this question or do I know enough about this thing
> already - chances are the answer is already squirrelled away in the
> agents knowledge base.
> 
> 2) Some answer tell you about more things than you asked about eg.
> Dublic Core - because a bunch of URIs for dc properties all redirect to
> the same description of all of them (modulo a spurious fragId last time
> I looked, that gets stripped anyway - and for which there is no referent
> in the resulting description). So you may already be informed about
> things that you haven't asked about.
> 
> The imperative for the agent to ask a question seems to be lack of
> knowledge of the answer. If it already has the answer... you can avoid
> asking the question.
> 
> My 2 cents.
> 
> Stuart Williams
> --
> Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks
> RG12 1HN
> Registered No: 690597 England
> 
> > -----Original Message-----
> > From: semantic-web-request@w3.org 
> > [mailto:semantic-web-request@w3.org] On Behalf Of Jeremy Carroll
> > Sent: 10 July 2007 13:45
> > To: Jacek Kopecky
> > Cc: Giovanni Tummarello; semantic-web@w3.org
> > Subject: Re: caching HTTP 303 responses
> > 
> > 
> > 
> > Is there some motivation for the MUST NOT cache constraint?
> > 
> > A thought is that there are quite complex HTTP cache control 
> > mechanisms which may not work correctly. But I suppose 302s 
> > are cached, and can be updated, and the behaviour is 
> > acceptable.... so that the same mechanisms should work with 
> > 303 (except for the prohibition).
> > 
> > ....
> > 
> > thinking out loud, without reading the specs,
> > 
> > Jeremy
> > 
> > 
> > 
> > Jacek Kopecky wrote:
> > > Hi Giovanni,
> > > 
> > > barring the change away from 303 for non-information resources, or a
> 
> > > change to the cacheability of 303, one could indeed make a patch for
> 
> > > squid.
> > > 
> > > The way I'd go about it, not to break too much, would be to add a 
> > > request ID header which would differ for different user requests,
> and 
> > > the squid would cache everything within the same request ID, and it 
> > > would follow the specs for different requests.
> > > 
> > > The request ID would be treated as enabler for these "atomically 
> > > cacheable" things (everything), atomically as in "in the same user 
> > > request processing". And this could mean statefulness in squid
> (prolly 
> > > a very bad thing) if there was a requirement to interleave the 
> > > processing of multiple user requests.
> > > 
> > > But thinking about this, fixing 303 cacheability or maybe adding a 
> > > cacheable 308 Description Elsewhere sounds easier now. 8-)
> > > 
> > > Jacek
> > > 
> > > On Tue, 2007-07-10 at 01:20 +0100, Giovanni Tummarello wrote:
> > >> Hi Jacek,
> > >>
> > >> unfortunately the "application cache" is not always possible. .
> > >> The key to cluster scalability is splitting jobs across the cluster
> 
> > >> nodes so each file is more or less processed per so.
> > >> Web architecture then says that if you want to go fast.. you can
> cache.. 
> > >> so one puts a large proxy where all the nodes in theory can feed. 
> > >> This is what we thought we'd do.. just to find out that each
> process 
> > >> was running a few dozen times slower than what it could (to say 
> > >> nothing on the remote hits which is the real problem) due to squid 
> > >> rightfully refusing to cache 303.
> > >> We could write a "semantic web patch" for squid to explicitly
> violate 
> > >> a MUST NOT.. but.. :-) .
> > >> Giovanni
> > >>
> > > 
> > > 
> > > 
> > 
> > --
> > Hewlett-Packard Limited
> > registered Office: Cain Road, Bracknell, Berks RG12 1HN 
> > Registered No: 690597 England
> 
>
Received on Thursday, 12 July 2007 11:13:56 UTC