Re: Proposed HTTP SEARCH method update - QUERY is to GET what PATCH is to PUT from ashok malhotra on 2015-04-27 (ietf-http-wg@w3.org from April to June 2015)

From: ashok malhotra <ashok.malhotra@oracle.com>
Date: Mon, 27 Apr 2015 16:36:36 -0400
To: "henry.story@bblfish.net" <henry.story@bblfish.net>, "Roy T. Fielding" <fielding@gbiv.com>
CC: Julian Reschke <julian.reschke@gmx.de>, Mark Nottingham <mnot@mnot.net>, James M Snell <jasnell@gmail.com>, ietf-http-wg@w3.org
Message-ID: <553E9DD4.1070001@oracle.com>
I prefer QUERY but was outvoted.

All the best, Ashok

On 4/27/2015 4:34 PM, henry.story@bblfish.net wrote:
>
>> On 27 Apr 2015, at 20:52, Roy T. Fielding <fielding@gbiv.com> wrote:
>>
>> [resending from the right address]
>>
>>> On Apr 27, 2015, at 1:52 AM, henry.story@bblfish.net wrote:
>>>
>>>> On 27 Apr 2015, at 08:59, Julian Reschke <julian.reschke@gmx.de> wrote:
>>>>
>>>> On 2015-04-27 07:36, henry.story@bblfish.net wrote:
>>>>> ...
>>>>> It would help if you explained how you disagree with the arguments put forward.
>>>>> Let's try the Socratic method then:
>>>>>
>>>>> (1) Do you agree that SEARCH is (should be) a method that is applied to the resource on which the request is made?
>>>>
>>>> Depends on the definition of "applied". You might want to read <http://greenbytes.de/tech/webdav/rfc5323.html#rfc.section.2.2.1>.
>>>
>>> I suppose I don't find that notion of "search arbiter" very satisfying in WebDAV. It is defined just as "A resource that supports the SEARCH method". My feeling is that by specifying SEARCH more generally in its own RFC we could do a lot better than this, and have something that ties in more coherently with the other methods. This is why I'd like to strengthen the following parallel: SEARCH is to GET what PATCH is to PUT. If we can get this to work then we build on very well understood intutions of GET and PUT, which are at the core of the Web, in a clearly RESTful manner.
>>
>> The original SEARCH method was removed from HTTP in 1994 because it discouraged
>> the use of identifiers for important resources. In short, it weakened the Web.
>>
>> What MS implemented as SEARCH [1] and WebDAV defined (in RFC5323) was a really bad idea
>> for a site-wide query interface, along the same bad taste lines as service endpoints
>> for web services.
>>
>> [1] https://msdn.microsoft.com/en-us/library/aa143053%28v=exchg.65%29.aspx
>>
>> In both of those definitions, SEARCH was not a method that scoped its results to the requested
>> URI.  That URI, in fact, had almost nothing to do with the results, making implementation
>> of the method a significant security risk.
>>
>> Neither of those were RESTful in any sense of that term.  Regardless, the method was defined
>> and registered as such for HTTP (which is not limited to RESTful interaction), and there is
>> no reuse of method names allowed in HTTP.  You need to use a new method name if you want to
>> define a method with different semantics.
>
> It is my feeling that the authors of this draf want to do the right thing here, but were worried
> to come up with a new method name, and favored going with something existing like SEARCH rather
> than try something new. I was hoping in the previous mail that SEARCH could be redefined to do
> the right thing. But if it cannot then another method name is welcome.
>
> For this I propose to use the verb QUERY, which I think would be much clearer and less heavy sounding than SEARCH, which is too active and open.
>
>> In any case, it remains true that generalized retrieval in the form of a method other
>> than GET would be actively harmful to the Web.  Partial query within the scope of the
>> current representation of a single resource is a very distinct concept that, if defined,
>> ought to have a distinct method name.  However, I would still consider that a bad idea,
>> since it is a loss in every way when compared to a link template to sub-resources.
>
> I do have some further arguments for QUERY that I feel would argue strongly in favor of it.
> Those I can think of at present are:
>
> • Query templates (urls with attribute values following a '?' have a number of drawbacks:
>     - size limit: the URL are limited practically to 2k or so.
>     - URL explosion: one ends up with a lot more URLs - and hence resource - than needed,
>   with most resources being just partial representation of the original resource
>     - no query language: If one wanted to use advanced query languages in the template, this would make for even worse URLs, that would if one wanted them to be somewhat transparent require even mime type information to be added to the URL! Some services do put XQuery or SPARQL in the URL. I just don't feel that is right.
>
> • QUERY is better for caching.
> Query templates do not allow the caching behavior that we are looking for.
>    - An intelligent cache would need to be able to learn the semantics of the attribute values following the ? to be able to work out that a change to the original resource would change the query template URL too. This cannot be done in a generic manner across web sites. Somehow caches would need to understand the relation between a number of query urls and the original resource.
>    This is not the case for a QUERY verb following the restrictions you proposed. Here a cache seeing that a successfil PUT or a PATCH on the same URL has taken place can immediately invalidate all the QUERY results it has for that resource. Of course a DELETE would invalidate all the QUERY cache too.
>    - The same is true for time to live information for query urls, which are disconnected from the
>   original resource
>    - The cache cannot itself respond to queries, which it might be able to do if it had the full representation returned by a GET on that resource. ( something we would be able to explore if we
> define it correctly )
>
> • Access Control
>     With query urls you have a huge number of URLs referring to resources with exactly the same access control rules as the non query resource. This means much more complex access control rules, more work tracking of rights managements, privacy settings, etc...
>
>>
>> BTW, HTTP defines caches based on the generic method/response relationship, not based on the
>> GET/representation relationship.  It is possible to cache responses to arbitrary methods if
>> the response indicates it is cacheable.  What is difficult is how to determine the cache key
>> for a given request, which HTTP only defines only for GET and for responses containing the
>> Content-Location header field (new methods would have to define their own cache keys).
>> Generic HTTP caches only implement for GET/HEAD because the likelihood of receiving a
>> later hit on a cached entry for anything other than GET is so ridiculously tiny that it isn't
>> worth caching outside of application-specific client caches (for which generic semantics
>> are usually irrelevant because the application can be tightly coupled to the cache).
>
> Speaking from my own experience, which I think can generalise to other frameworks, I know there are applications using the semantic web where we want to write client applications that can follow linked data around the web using just RESTful interaction patterns where QUERY would be very important method: it would allow the client to be able to build up quickly user interfaces by fetching just the data in a resource that it needs for the current display: eg. the minimal bounded graph around a node in a graph. Here time is critical: the User Interface has to display quickly and be reactive. If we are limited only to GET on a resource that is being downloaded somewhere from the  web by following links, that may also be of unknown size, we won't be able to gain adoption. This is why for me it is very important that the "query be within the scope of the current representation of a single resource", even if this resource can of course make statements about other resources, as at!
 om feeds
 do all the time. This is because the client needs to know that the query on a particular resource is just a part of what it would have gotten had it asked for a GET. It would be of no use for the client if a QUERY on one resource just gave it information about some arbitrary other resource.
>
>   To give a bit of context: we are trying to build generic applications following RESTful protocols. If we succeed then "application specific client caches" will become standardised, and so this space will be huge. Potentially every server on the web - and here we will want to count the hundreds of million of Freedom Boxes where citizens can host their own social network - will have a cache that will work in the background fetching feeds, receiving notifications, etc...
>
> Anyway, it is in writing these  user interfaces that the importance of QUERY became clear to me. I think
> others will find many of the arguments apply to their use cases too, and to other query languages, even if they don't buy into the passion which drives me to work on these open source projects.
>
> Thanks a lot for your insight,
>
> 	Henry Story
>
> Social Web Architect
> http://bblfish.net/
>
>>
>>
>> Cheers,
>>
>> Roy T. Fielding                     <http://roy.gbiv.com/>
>> Senior Principal Scientist, Adobe   <http://www.adobe.com/>
>
>
Received on Monday, 27 April 2015 20:37:11 UTC