Re: Proposed HTTP SEARCH method update - QUERY is to GET what PATCH is to PUT from henry.story@bblfish.net on 2015-04-28 (ietf-http-wg@w3.org from April to June 2015)

From: <henry.story@bblfish.net>
Date: Tue, 28 Apr 2015 14:44:22 +0200
To: Julian Reschke <julian.reschke@gmx.de>
Cc: "Roy T. Fielding" <fielding@gbiv.com>, ashok malhotra <ashok.malhotra@oracle.com>, Mark Nottingham <mnot@mnot.net>, James M Snell <jasnell@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
Message-Id: <0F1695CB-BB66-41B9-B9D1-C0897F7D8D49@bblfish.net>
[ Where I try to argue more formally to see if there is space for agreement
between Julian and Roy. I also give 2 arguments as to why SOAP SEARCH will not
be viable in a hyperdata web. But this reasoning ends me wondering wether QUERY 
might not fold neatly into GET. ]

> On 28 Apr 2015, at 07:29, Julian Reschke <julian.reschke@gmx.de> wrote:
> 
> On 2015-04-28 00:41, henry.story@bblfish.net wrote:
>> 
>>> On 27 Apr 2015, at 22:49, Julian Reschke <julian.reschke@gmx.de> wrote:
>>> 
>>> On 2015-04-27 22:34, henry.story@bblfish.net wrote:
>>>> ...
>>>> It is my feeling that the authors of this draf want to do the right thing here, but were worried
>>>> to come up with a new method name, and favored going with something existing like SEARCH rather
>>>> than try something new. I was hoping in the previous mail that SEARCH could be redefined to do
>>>> the right thing. But if it cannot then another method name is welcome.
>>>> ...
>>> 
>>> There are already three safe HTTP methods that take a request payload (PROPFIND, SEARCH and REPORT). Some software stacks already know about these (for instance, wrt whether a request can be safely repeated on network failure). There's simply no reason to create yet another one, when, from an HTTP point of view, it does exactly the same thing.
>> 
>> I think the disagreement is that SEARCH does what we need. To quote Roy Fielding's
>> answer:
>> 
>>  " In both of those definitions, SEARCH was not a method that scoped its results
>>    to the requested URI.  That URI, in fact, had almost nothing to do with the
>>    results, making implementation of the method a significant security risk. "
> 
> Well, Roy is overstating things. That particular usage pattern depends on the query grammar, and is totally optional in the only grammaer defined in RFC 5323; see <http://greenbytes.de/tech/webdav/rfc5323.html#rfc.section.5.4>.

Interesting. Thanks for the pointer. 

My feeling is that there is the sliperiness of the notion of scope may be leading to confusion.
( But Roy may be basing his statement on other documents, so I can't comment for him on that. )

As I argued previously if I query a document that makes a statement about another document,
as we are constantly doing on this mailing list, by quoting each other, then one can at least
make queries as to what a document says another document says. That is I suppose uncontroversial.
So I could query my previous mail asking it whether it stated that Roy stated that SEARCH
was not scoped to the request URI. If the mail archive were QUERY/SEARCH enabled with an advanced
human language query language, then it could answer to "yes" to that question.
So I suppose Roy has no issue with that type of scoping. If I asked the same resource if Roy 
made a statement about caching in that same e-mail, it would have to say either no, since 
I did not quote him as saying so, or perhaps "not as far as I can tell from this resource".

Perhaps if we now consider the more difficult case that may seem to be behind the clash between 
you and Roy on WebDAV. First I note that here WebDAV, Atom, and LDP ( Linked Data Platform ) [1] 
use very  similar concepts. They all have a notion Containers and their contents. LDP even defines 
the  ldp:contains relation relating containers to the contents they created. Because I know LDP best
I can demonstrate this with a simple example. To take the LDP Primer [2] figure 1, I could have a </container/> resource that returned on a GET requesting text/turtle, the following [3]

<> a ldp:Container;
   ldp:contains <jane>, 
                <image> .

Here we have a container containing a data resource <jane> and an image. A DELETE on </container/image>, MUST remove the ldp:contains relation from the container. One can therefore consider - as with WebDAV depth
1 scoping - that the relation between the container and the contents are close enough here that the container could potentially *quote* the contents too ( as Atom does with its <content> element ). 
It ABSOLUTELY MUST quote the contents or else someone could lie about the the container in the content, and mess everything up by claiming the container contained things it did not contain. So potentially the container could return on a GET with the text/n3 the direct content and the quoted
content.

<> a ldp:Container;
   ldp:contains <jane>, 
                <image> .

<jane> log:semantics {
   <jane> foaf:primaryTopic <jane#me>.
   <jane#me> a foaf:Person;
      foaf:name "Jane";
      foaf:depiction <image>;
      foaf:knows <http://www.w3.org/People/Berners-Lee/card#i> .
}


Here <jane> log:semantics {...} functions like the <content> element in 
atom xml.
 
(Of course quoting an image inside the container is just ugly, and if you 
hair don't stand on your head at the thought then you are probably not reading 
this list :-)

Still it would make sense to be able to QUERY/SEARCH the container with a language
that understood quotations. SPARQL [5] for example allows this and one could ask
something like

QUERY /container/ HTTP/1.0
Content-Type: application/sparql-query; charset=UTF-8
...

CONSTRUCT { ?person foaf:name ?name }
WHERE {
 <> ldp:contains ?g .
 GRAPH ?g {
  ?g foaf:primaryTopic ?person.
  ?person foaf:name ?name .
} 

This would make more sense if this container contained a lot of content of course,
and this is what Ashok is referring to in his question about books.

Now here again I think we can still argue, if we are very careful, that the query
is only covering the state of the resource, even if the resource has a very strong
binding with its contents, which can and should be queried independently. (My guess
is that if the contents are that tightly linked to the content, then any change to
the contained resources should automatically also alter the containers etag.)

So we have to be very careful here because if this is abused then we would end up
with a "generalized retrieval in the form of a method other than GET [which] would 
be actively harmful to the Web." (Roy Fielding) The fear here is that the only method
would then be SEARCH on the root container, and then we would have a new form of
SOAP going through SEARCH, which would mess up a lot of nice properties of the web.

So what is the counter-weight to this happening? I think it is simply the growing
deployment of Linked Data ( as shown by http://lod-cloud.net ). A Linked Data
application finds the definition of a term by dereferencing the URL of the object.
So for example if my JS client wants to know how to display 

  <http://www.w3.org/People/Berners-Lee/card#i>

Then it will have to do a GET on <http://www.w3.org/People/Berners-Lee/card>
to get its meaning as shown in the WebID diagram [6]. It would not know initially
to do a SEARCH on <http://www.w3.org/People/> , or on <http://www.w3.org/>. The
authoritative resource is Tim Berners Lee's WebID without the fragment identifier 
( as explained by §3.5 of RFC 3986 on URIs [7] ).

Furthermore a search on <http://www.w3.org/People/> would be much more complex, as
it would need to be stated in terms of what <http://www.w3.org/People/> believes to
be the state of graphs it contains, or for <http://www.w3.org/>, were it to be a container,
what containers it contains believe their content to contain. The further we get away
from the resource the more levels of nesting of quotation we have, and the less likely 
we are to really know what is going on. We might as well ask the original resource. 

So these are a couple of reasons why in a hyperdata space GET will remain the primary
retrieval mechanism, and all engulfing SEARCH SOAP Bubbles will have very short half
lives.

Well if GET is the method used to find the authoritative meaning of terms, how does 
SEARCH/QUERY come to be used then? In rww-play the server implements SEARCH 
( I can switch to QUERY without problem ) and the JS knows this, so it can query the 
resources using that method - and this makes sense as long as it is guaranteed that 
it is getting just a part of the information about the resource which it would also 
have with GET.  For remote resources the JS goes through a protected caching proxy 
( urls on the server that can asynchronously fetch remote resources, for the user ) 
and the JS also knows that SEARCH/QUERY is available for those. In the long 
term if QUERY were to be widely enough available the JS could directly make a request on the 
original resources using QUERY.

At this point though one nearly wishes for QUERY to default to GET, so that a resource
that does not understand the QUERY language sent would still return the full representation. 
This type of fallback is not possible for PATCH to PUT since PUT already has content. 
But since GET never has an associated content, it may oddly enough work there... It would 
require the returned content to contain a header stating that the response is a response 
to the query in the GET body, because otherwise the client may not be able to distinguish 
the result of the QUERY from the full result returned by a server that did not understand the body. 

Hmmmm. Any thoughts on this?

Henry


[1] the LDP WG wiki page: http://www.w3.org/2012/ldp/wiki/Main_Page
    The published W3C recommendation: 
       http://www.w3.org/TR/ldp/
[2] http://www.w3.org/TR/ldp-primer/
[3] I changed the second content to be an image in order to get a bit faster to the point.
[4] http://www.w3.org/TeamSubmission/n3/ 
[5] http://www.w3.org/TR/sparql11-query/
[6] http://www.w3.org/2005/Incubator/webid/spec/identity/#overview
[7] https://tools.ietf.org/html/rfc3986#section-3.5

Social Web Architect
http://bblfish.net/
Received on Tuesday, 28 April 2015 12:44:53 UTC