Re: QUERY Verb Proposal from henry.story@bblfish.net on 2015-01-20 (public-ldp-wg@w3.org from January 2015)

From: <henry.story@bblfish.net>
Date: Tue, 20 Jan 2015 15:00:06 +0100
To: Yves Lafon <ylafon@w3.org>
Cc: Sandro Hawke <sandro@w3.org>, ashok malhotra <ashok.malhotra@oracle.com>, public-ldp-wg@w3.org
Message-Id: <C825662A-09FC-40BF-8DB3-F01151AD1606@bblfish.net>
> On 20 Jan 2015, at 14:22, Yves Lafon <ylafon@w3.org> wrote:
> 
> On Tue, 20 Jan 2015, henry.story@bblfish.net wrote:
> 
>>> One of the reasons the HTTP WG is very unlikely to standardize this is that there's so little technical advantage to doing this with a new verb (at least as far as I can see).  The main reasons would be queries > 2k, but your saved queries solve that, and allowing intermediate nodes to understand and cache based on query semantics, ... and MAYBE the Get option would allow that.
>> 
>> Some of the disadvantages of your approach I can think of at present:
>> 
>> ? Queries are limited to < 2k
> Source?
> 
> http://tools.ietf.org/html/rfc7230#section-3.1.1
> <<
>  HTTP does not place a predefined limit on the length of a
>  request-line, as described in Section 2.5.  A server that receives a
>  method longer than any that it implements SHOULD respond with a 501
>  (Not Implemented) status code.  A server that receives a request-target
>  longer than any URI it wishes to parse MUST respond with a 414 (URI Too
>  Long) status code (see Section 6.5.12 of [RFC7231]).
> 
>  Various ad hoc limitations on request-line length are found in
>  practice.  It is RECOMMENDED that all HTTP senders and recipients
>  support, at a minimum, request-line lengths of 8000 octets.

Things may have changed. When I worked at AltaVista in 2001 there was a 
limit of 2k for URL due  old proxies, etc... Still there is a limit as
indicated there, and there has to be one, or else denial of service attacks
through creation of infinitely long URIs would be all to easy. ( I broke
a server once by just sending it infinitely long headers, via a shell
script :-)

Consider also that you may be using a large number of ontologies which 
you need to use the prefix for which you have then to encode and which you have
to paste onto your intial query URL. It should be clear that  to put URLs inside 
of URLs you are going to end up breaking the limit above.

> 
>> ? URLs are no longer opaque. You can see this by considering the following:
>> - if a cache wants to use the query URL to build up a partial representation of
>> the original document, it would need to parse the query URL. So we end up with mime
>> type information in the URL.
> URL templates anyone? <http://tools.ietf.org/html/rfc6570>

To mandate that breaks web architecture and is bad for security. 
What if you want to develop URLs that are as opaque as possible 
to avoid people reading links being able to determine what it is 
referring to? 
This type of things works for form based queries because the
form is generated by the server that then is going to parse the 
query. If you want an open web where any service can make a query
then you need something more generic than that. If you make it as
generic as the QUERY verb proposed here, then you end up putting
a language into the URL with a mime type as indeed was proposed by
Sandro.

URL encoding SPARQL queries is just ugly for any number of reasons.

> 
>> - If the cache sees the query URL but does not know that the original resource
>> is pointing to it, then it cannot build up the cache ( and it cannot know this
>> without itself doing a GET on the original URL, because otherwise how would it deal
>> with lying resources that claim to be partial representations of other URLs? )
>> ? URL explosion: one ends up with a lot more URLs - and hence resource - than needed,
>> with most resources being just partial representation of resources, instead of
>> building up slowly complete representation of resources.
> Querying something on the web using URIs is hardly new.

Here the aim is not to query the Web, as with AltaVista, but to query a 
resource directly, to get relevant subsets of the resource. It is an interesting
question whether on querying a LDPC you can also query its contents. 

> 
>> ? caching
>> - etags don't work the same way on two resources with two URLs as with one
>>  and the same URL
>> - the same is true with time-to-live etc.
>> - A PUT, PATCH, DELETE on the main resource won't tell the cache that it should
>>  update all the thousand of other resources that are just views on the
>>  original one
> Why? This is an implementation detail server-side.

Because the cache may not have seen the PUT, PATCH or DELETE. You may have 
done that using another proxy. eg: one at home and the other at work.

> 
>> - The cache cannot itself respond to queries
>>   A cache that would be SPARQL aware, should be able to respond
>>   to a SPARQL query if it has received the whole representation of the
>>   resource already - or indeed even a relevant partial representation )
>>   This means that a client can send a QUERY to the resoure via the cache
>>   and the cache should be able to respond as well as the remote resource
>> ? Access Control
>>  Now you have a huge number of URLs referring to resources with exactly the same
>>  access control rules as the non query resource, with all that can go wrong, when
>>  those resources are not clearly linked to the original
>> ? The notion of a partial representation of an original resource is much more opaque
>> if not lost without the QUERY verb. The system is no longer thinking: "x is a partial
>> representation of something bigger, that it would be interesting to have a more complete
>> representation of"
>> 
>> Btw. Do we have a trace of the arguments made in favor of PATCH. Then it would be a case
>> of seeing if we can inverse some of those arguments to see if we are missing any here.
>> 
>>> 
>>> BTW, all my query work these days is on standing queries, not one time queries.  As such, I think you don't actually want the query results to come back like this.   You want to POST to create a Query, and in that query you specify the result stream that the query results should come back on.  And then you GET that stream, which could include results from many different queries.   That's my research hypothesis, at least.
>>> 
>>>     -- Sandro
>>> 
>>>>> 
>>>>> Assume the HTTP WG will say no for the first several years, after which maybe you can start to transition from GET to QUERY.
>>>>> 
>>>>> Alternatively, resources can signal exactly which versions of the QUERY spec they implement, and the QUERY operation can include a parameter saying which version of the query spec is to be used. But this wont give you caching like GET.   So better to just use that signaling for constructing a GET URL.
>>>> Gimme a little more to help me understand how this would work.
>>>>> 
>>>>>     -- Sandro
>>>>> 
>> 
>> Social Web Architect
>> http://bblfish.net/
>> 
>> 
>> 
> 
> -- 
> Baroula que barouleras, au tiéu toujou t'entourneras.
> 
>       ~~Yves
> 

Social Web Architect
http://bblfish.net/
Received on Tuesday, 20 January 2015 14:01:09 UTC