RE: SPARQL Wishlist from McCusker, James Patrick on 2019-04-01 (public-sparql-12@w3.org from April 2019)

From: McCusker, James Patrick <mccusj2@rpi.edu>
Date: Mon, 1 Apr 2019 14:11:45 +0000
To: Jürgen Jakobitsch <juergen.jakobitsch@semantic-web.com>, "public-sparql-12@w3.org" <public-sparql-12@w3.org>
Message-ID: <2551044AB996F940ABC5EA94F70044DDF076CF78@EX14MB6.win.rpi.edu>

To build on this idea of streaming, I think it would be valuable for SPARQL to support "standing" queries. One can imagine this construction:

POST/GET a query, with parameter `async=true'
Response is 201 CREATED, with a header of "Location: <x>" and possibly an authorization token for modification.
Performing a GET on <x> at any point responds with any results accumulated so far.
If the auth token is available, user can perform a DELETE to remove/deactivate the query.
GETs can include limit and offset parameters for paging.

This would allow expensive and/or standing queries to be computed and stored, rather than recomputed fresh. A delete from the graph would need to reset or "un-match" the results, but streaming in new data would be equivalent to a streaming engine. Non-streaming engines can use this to accumulate and save, even if their results aren't incremental.

Jim

________________________________
From: Jürgen Jakobitsch [juergen.jakobitsch@semantic-web.com]
Sent: Monday, April 01, 2019 9:55 AM
To: public-sparql-12@w3.org
Subject: SPARQL Wishlist

hi there,

as indicated by andy, we should carry on with this conversation on the public mailing list..

i hereby restart the thread with my wishlist (i'm pretty sure there also will be wiki page or other means to collect suggestions in the near future) :-)

1. as a sucker of query optimization and the grand reducer of joins of whatever sort, i really, really would appreciate
   execution sequence hints or at the very least FROM in subqueries and related a well defined sequence of what comes first: SERVICE or subselect.
2. as a sucker of "words are flowing out like endless rain" (beatles: across the universe) i fully support any forms of stream capabilities. rdf is just made
    for streams, a query type a la STREAM ?x FROM <http..> WHERE { ...
3. sometimes also very little things are required: a sequence (per group or the whole result).. (this is for example possible with virtuoso)
4. vectorization on the fly would also be neat, we wanna do cool stuff like ML, cooccurences, linguistic statistics,... don't we?
5. "split"..
   or in general "set creating" functions.. this is usually only possible with custom function these days, rdf4j for example requires usage of an extended evaluation strategy, stardog can do it with custom function,
   as well as virtuoso with PL/SQL.. my preferred option would be "split by regex"

mtfbwy j

Jürgen Jakobitsch
Senior Technical Consultant
Semantic Web Company GmbH
EU: +43-14021235<tel:+43%201%204021235>
US: (415) 800-3776<tel:(415)%20800-3776>
Mobile: +43-676-6212710<tel:+43%20676%206212710>
https://www.poolparty.biz<https://www.poolparty.biz/>
https://www.semantic-web.com<https://www.semantic-web.com/>

Download E-Book: Introducing Semantic AI<https://www.poolparty.biz/machine-learning-meets-semantics/>

Received on Monday, 1 April 2019 14:14:47 UTC