Re: Proposed HTTP SEARCH method update from henry.story@bblfish.net on 2015-04-26 (ietf-http-wg@w3.org from April to June 2015)

From: <henry.story@bblfish.net>
Date: Sun, 26 Apr 2015 11:09:33 +0200
To: Amos Jeffries <squid3@treenet.co.nz>
Cc: ietf-http-wg@w3.org
Message-Id: <5ABC7C48-55A9-4304-B691-8BE98F393D72@bblfish.net>
> On 26 Apr 2015, at 09:48, Amos Jeffries <squid3@treenet.co.nz> wrote:
> 
> On 26/04/2015 5:40 p.m., henry.story@bblfish.net wrote:
>> 
>>> On 25 Apr 2015, at 20:56, Mark Nottingham <mnot@mnot.net> wrote:
>>> 
>>> 
>>>> On 26 Apr 2015, at 6:49 am, henry.story@bblfish.net wrote:
>>>> 
>>>> If search is cacheable when the Conent-Location of the response
>>>> matches the effective request uri, how does that show that the
>>>> SEARCH response is not cacheable?
>>> 
>>> It is cacheable — in the sense that it can be stored. However, that
>>> stored response can only be used to satisfy future GET requests to
>>> the same URI — which is probably not what you want.
>>> 
>>> HTTP caching operates upon representations of resource state, which
>>> means accessing the contents of the cache is always GET (or HEAD).
>> 
>> Great!  We are getting closer to the core of the problem, which I had
>> tried to address earlier :-)
>> 
>> I agree that HTTP caching should operate upon representations of
>> resource state. My point it that SEARCH always returns a partial
>> representation of the resource state, so that it should be cachable
>> too. This means improving the caching stack so that it knows how to
>> update partial representations.
>> 
>> To illustrate the different cases, let us take the example from
>> http://datatracker.ietf.org/doc/draft-snell-search-method/ . The
>> resource <http://example.org/contacts> is a small table of contacts. 
>> Let us imagine that
>> 
>> A. GET followed by SEARCH -------------------------
>> 
>> 1. James makes a GET request on <http://example.org/contacts> via a
>> cache C and C caches the returned  representation with etag1 ( and
>> the same Location header ) 2. Ashok then makes a conditional SEARCH
>> request on <http://example.org/contacts> via the same cache C with
>> etag1 too
> 
> What you describe here is a GET being cached and then used to answer a
> SEARCH.
> 
> This has no bearing on whether SEARCH is cacheable. The response to the
> SEARCH remains itself regardless of whether it was generated from the
> origin resource or a *complete* copy of the origin resource.

This example was just to show that the SEARCH should be completely
dependent on the representation returned by the GET.

>> 
>> B. SEARCH followed by SEARCH ----------------------------
>> 
>> 1. a JS Agent in a browser does not know how large </contacts> is,
> 
> Therefore the cache cannot answer SEARCH C requesting 5 records, if the
> SEARCH A only caused 3 records to be cached.
> 
> Meaning the SEARCH has dynamic variance in responses. The method alone
> cannot state that the response is cacheable since it depends on other
> case-specific criteria.

yes.

> 
> 
> 
>> and only needs to render a couple of fields from that table, so it
>> sends a SEARCH for those fields to the server example.org via it's
>> local in browser cache BC. 2. The same JS Agent later needs the same
>> to fields again for some different purpose, and sends the same query
>> again using a conditional GET with the same etag.
> 
> Wrong. It would need to send SEARCH again with the same variant
> negotiation details (ie query and language). *IF* the cache has for any
> reason discarded its earlier copy the server will be getting the
> request. You really dont want it to be GET on a large database.

Duh! I meant "using a conditional SEARCH"! Thanks for spotting that.

> 
> SEARCH as a method supplies similar cache needs and semantics to
> GET+Range request. Except that there is no byte offsets defined in the
> SEARCH request to make things easy for caches to identify object byte
> overlaps in stored responses.

yes

> 
> As such, Range can be used to best describe the caching problem with
> SEARCH. ... Caching Range responses is possible but a 206 response with
> 10 bytes of length cannot be used to supply a single Range of 20 bytes.
> 
> With SEARCH this problem is slightly worse because we dont know if or
> where there the missing records might be in the resource. We are forced
> to send the whole SEARCH to the server to find out - which means the new
> reply effectively replaces any cached one and there is no gain.
> At least with Range the cache could have optimize the backend query to
> ask for bytes 11-20.
> 
> 
> IFF we make the assumption that SEARCH is cacheable *somehow* - the
> cache will need to compare the method, URI and all other negotiation
> criteria defined by SEARCH as being relevant before it will serve up the
> cached response. Any non-match is a different SEARCH query requiring the
> backend server to supply new response.

yes. If all the relevant fields match exactly then it should be possible 
to just send back the same byte for byte response. ( it remains to be determined
what the relevant fields are )

Note that if the SEARCH query is place in the body of the request as is suggested
by draft-snell-search-method ( which makes a lot of sense ), then the 
content of the request body needs to be taken into account too. 

> I like your proposal that the response to SEARCH be treated as a partial
> response. To the point that I also believe it should be indicated that
> the preferred response status code is 206 with Content-Range header
> indicating the positions or numbers (in bytes, or maybe a new range
> type) of the fetched records within the base resource.
> 
> The 206 status is already defined in RFC 7233 with suitable criteria to
> cover the SEARCH cases. The wording for it also allows a SEARCH "not
> cacheable by default" definition to be ignored - that could be improved
> by SEARCH adding a mention that 206 might be cacheable.

yes. That is what I was getting at. Thanks for helping me clarify that in
terms of established RFCs. 206 (Partial Content) is a good find :-)

I think in this case the range is defined by the SEARCH query sent in the body
and the mimte type of the body of course. ( an initial investigation path may
be to consider a hash of those as a potential candidate Content-Range ) 

So for the hypothetical SPAQRL query one could imagine the first query requesting
the number of columns and rows, and then subsequent queries filling in different
chunks of columns and rows. A SPAQRL aware cache could then look at the data it already
had for a particular resource to see if it could answer a particular query without needing
to make a new request to the server. Say the first 10 queries ( QUERY was the original 
name of the method proposed btw http://www.w3.org/Protocols/HTTP/Methods.html ) gave it 
all the information  for all the columns of the first 100 rows, then a new query that 
only made requests within that chunk of the table, would allow the cache to respond from 
the data available  to it. At the limit the cache should even be able to work out if 
it had received the whole view, and from that be able to respond to a GET, even though it
had never made a GET request at all.

This is probably the way partial responses using byte offsets work currently with
caches. If a number of requests give the partial offset for byte 0 to 100 thousand, then
a new request for bytes 500 to 10 thousand should be able to be served without the cache 
needing to make a new request to the origin server. 

SEARCH does require more semantics though than byte offsets, but it is nice to see
that it can fit in that mould. This extra complexity of SEARCH also explains why it
has taken so much longer to make its way to the IETF .


Henry

> 
> Amos

Social Web Architect
http://bblfish.net/
Received on Sunday, 26 April 2015 09:10:04 UTC