Re: Release of SAGE 1.0: a stable, responsive and unrestricted SPARQL query server from Ruben Verborgh (UGent-imec) on 2018-09-09 (public-lod@w3.org from September 2018)

From: Ruben Verborgh (UGent-imec) <Ruben.Verborgh@UGent.be>
Date: Sun, 9 Sep 2018 20:20:21 +0000
To: Kingsley Idehen <kidehen@openlinksw.com>
CC: "public-lod@w3.org" <public-lod@w3.org>
Message-ID: <9F945592-E9E9-43F7-ADAF-A92D115A1D59@ugent.be>

Hi Kingsley,

> How about:
> 
> Virtuoso, SAGE client+server solution, TPF etc.. are classes of solutions that support the SPARQL Query Language which doesn't imply the SPARQL Protocol (which is associated with "SPARQL Endpoints"). 

Fine with me.

> Yes, distinguishing these things is very important. The general habit of using a single name in the most generic form is highly problematic. 

Let’s instead use URIs from now on ;-)
https://www.w3.org/TR/sparql11-query/

https://www.w3.org/TR/sparql11-protocol/


> I believe in "horses for courses" i.e., use the best tool for the problem at hand. I am a strong believer in the notion of "Small Data" as THE vehicle for full appreciation of what Linked Data brings to RDF, despite the tendency to leverage Virtuoso as a vehicle for mass loading of datasets

+1 for small data here;
I’m interested in querying large numbers of small datasets
as opposed to querying low numbers of large datasets.

That said, SPARQL endpoints as interfaces to small datasets
might likely be feasible data-wise;
unless such small datasets are on constrained devices perhaps.

>> I do see a usage for the SPARQL Protocol in closed networks.
> 
> I don't see it as a "closed networks" thing.  We implemented "Anytime Query" functionality so that using it in public is feasible, as we've demonstrated for many years across several endpoints e.g., DBpedia, Uniprot, URIBurner, LOD Cloud Cache, and many others.

“Possible” is, I guess, a function of the server infrastructure
that you need for it.
Probably SAGE would claim to achieve more
with the same server hardware,
given that SAGE also leverages some client-side CPU.

> I cannot accept your position about "closed networks" confinement for the SPARQL Protocol when I know what we have and how we tackled the fundamental challenge [1][2][3]. 

Then I would—honestly—like to understand
what stops people from deploying SPARQL endpoints.
We have many more RDF datasets than SPARQL endpoints.

My personal guess would be a mixture of
fear for high server load (and downtime when not met),
as well as usability issues during setup.
But that is just speculation.
Would a survey be useful here?

> you are closing the door on the issue we actually solved via the implementation of our "Anytime Query" feature, which is proven by the live instances that we have in place.

Clearly, some open issues still remain
that stop people from installing them en masse.
We need to remove those obstacles,
but we need to know what they are first.

With TPF and SAGE, we seem to have assumed
that server (over)load is the main problem.

> We set it a 120 secs on DBpedia specifically in line with the "Fair Use" requirements of that particular instance. That threshold is configurable, which is the crux of the matter re. Virtuoso. 

This is were it gets really interesting,
because SAGE literally allows any query,
even an open ?s ?p ?o.
It does not appear necessary to a put a fair use guard.

Best,

Ruben

Received on Sunday, 9 September 2018 20:20:47 UTC