Re: QUERY Verb Proposal from Sandro Hawke on 2015-01-20 (public-ldp-wg@w3.org from January 2015)

From: Sandro Hawke <sandro@w3.org>
Date: Tue, 20 Jan 2015 10:53:45 -0500
To: "henry.story@bblfish.net" <henry.story@bblfish.net>
CC: Yves Lafon <ylafon@w3.org>, ashok malhotra <ashok.malhotra@oracle.com>, public-ldp-wg@w3.org
Message-ID: <54BE7A09.6000104@w3.org>
On 01/20/2015 10:16 AM, henry.story@bblfish.net wrote:
>> On 20 Jan 2015, at 15:43, Sandro Hawke <sandro@w3.org> wrote:
>>
...
>> In summary, I think the big argument for a QUERY verb is that it makes it much more practical to implement caches which understand they are caching RDF, and can short-cut some queries because they have the relevant triples cached.
>>
>> Until there's at least one major web infrastructure player who actually want to do that, it's hard to make a case for standardizing a QUERY verb.
> First QUERY is not limited to RDF, it could also work with other query frameworks such
> as XQUERY, or some JSON equivalent.

Yeah, that might help.     Amazon and Google at least have 
database-access offerings, that I know of.  It's not out of the question 
that they could want to standardize elements there, I guess.   That 
might be an angle with which to approach the HTTP WG.   Maybe Akamai has 
even considered something in that space.

> Also you don't need a major infrastructure player to use it, though having one would
> be nice. As I explained an LDP client needs to
>    • have a local cache ( e.g.. in the browser, or for servers on the server )
>    • because of CORS limitiations in browsers it is easier for JS code to fetch remote
>     resources via the "personal LDP server" that itself fetches the remote resources.
>     Such servers can have a lot more memory available to them compared to web browser
>     clients and can also be programmed much more flexibly, enabling the creation of
>     new interesting protocols. These servers can end up acting as caches on which QUERY
>     requests would be very useful.
>    • One can also imagine OS level caches - eg for Semantic Desktop projects
>   
> So there are many areas where caching and proxying can be useful. This does not only
> need to be at the major infrastructure layer.

Okay, I can see that.   I've been assuming HTTP would not be the 
protocol for those elements.   Rather, I think apps would use a 
programmatic API to talk to a local stack, which does caching and 
eventually goes out the network via HTTP.    But there's a certain 
elegance to allowing the client to speak HTTP not knowing/caring if 
there's a cache, yes.

>
> What I think we can learn from a QUERY proposal, by discussion with the IETF, is
>   1. how we should define such a verb for it to be correct at the HTTP layer
>    ( ignoring issues of infrastructural deployment )
>   2. how it could tie into LDP elegantly

I don't really understand these.    I can see how pushing for a QUERY 
verb would help us get review from IETF folks, yes.   Is that what you mean?

We probably need to define an abstraction, then show how to do that over 
GET and over QUERY, and characterize the differences, then ask for 
review on that.

I don't really know what abstraction would cover SQL, SPARQL, and the 
various no-SQL offerings like Amazon SimpleDB, Amazon DynamoDB, Google 
"Cloud Datastore", etc.   Off the top of my head, I guess you'd have to 
say all queries with joins should be considered as filter-queries 
against a constructed collection.  Then you can treat every query as 
filtering some collection for matching items (with limits and order 
by).   That gets you much of the way.    So, a query-able resource is 
(or has) a set of query-items, and the query defines a possibly-ordered 
subset of matching items.    Something like that?

In terms of serious practical issues with SPARQL, my initial question is 
probably about how you determine the dataset (the thing being queried) 
for a particular query operation.    The natural thing in LDP I guess 
would be that you query containers about their contents.   Basic or 
Direct Container C with resources R1, R2, ... would look like a dataset:

... empty-container triples for C...
... maybe all the other triples in C, although I'd like to get them out 
of the way...
<R1> { ... triples in R1 ... }
<R2> { ... triples in R2 ... }
...

What about nested containers?   If <R1> is a container which contains 
<R1a> and <R1b> can we add those at the top level?   (RDF doesn't 
support nested datasets).   I think so....

Maybe:

<C> { empty-container triples for C }
<C-something> { membership triples for C }
<R1> { triples in R1 }
<R1a> {triples in R1a }
... etc

By putting C as a graph name, we let a server have one dataset for many 
containers, I think..

What about querying non-contain graphs and datasets?   Like, trig and 
json-ld files, which potentially have their own named graphs? I think 
you put those at the top level, too, like R1a.

Is there any notion of metadata being separate?   Can I query a jpg and 
be querying its metadata?   If I query a container with non-RDF 
resources, is the graph name the URL of the member or the member's 
metadata resource?   Maybe not.

      -- Sandro

>   
>
>>        -- Sandro
>>
>>>>> - The cache cannot itself respond to queries
>>>>>    A cache that would be SPARQL aware, should be able to respond
>>>>>    to a SPARQL query if it has received the whole representation of the
>>>>>    resource already - or indeed even a relevant partial representation )
>>>>>    This means that a client can send a QUERY to the resoure via the cache
>>>>>    and the cache should be able to respond as well as the remote resource
>>>>> ? Access Control
>>>>>   Now you have a huge number of URLs referring to resources with exactly the same
>>>>>   access control rules as the non query resource, with all that can go wrong, when
>>>>>   those resources are not clearly linked to the original
>>>>> ? The notion of a partial representation of an original resource is much more opaque
>>>>> if not lost without the QUERY verb. The system is no longer thinking: "x is a partial
>>>>> representation of something bigger, that it would be interesting to have a more complete
>>>>> representation of"
>>>>>
>>>>> Btw. Do we have a trace of the arguments made in favor of PATCH. Then it would be a case
>>>>> of seeing if we can inverse some of those arguments to see if we are missing any here.
>>>>>
>>>>>> BTW, all my query work these days is on standing queries, not one time queries.  As such, I think you don't actually want the query results to come back like this.   You want to POST to create a Query, and in that query you specify the result stream that the query results should come back on.  And then you GET that stream, which could include results from many different queries.   That's my research hypothesis, at least.
>>>>>>
>>>>>>      -- Sandro
>>>>>>
>>>>>>>> Assume the HTTP WG will say no for the first several years, after which maybe you can start to transition from GET to QUERY.
>>>>>>>>
>>>>>>>> Alternatively, resources can signal exactly which versions of the QUERY spec they implement, and the QUERY operation can include a parameter saying which version of the query spec is to be used. But this wont give you caching like GET.   So better to just use that signaling for constructing a GET URL.
>>>>>>> Gimme a little more to help me understand how this would work.
>>>>>>>>      -- Sandro
>>>>>>>>
>>>>> Social Web Architect
>>>>> http://bblfish.net/
>>>>>
>>>>>
>>>>>
>>>> -- 
>>>> Baroula que barouleras, au tiéu toujou t'entourneras.
>>>>
>>>>        ~~Yves
>>>>
>>> Social Web Architect
>>> http://bblfish.net/
>>>
>>>
> Social Web Architect
> http://bblfish.net/
>
>
Received on Tuesday, 20 January 2015 15:53:54 UTC