- From: Ruben Verborgh (UGent-imec) <Ruben.Verborgh@UGent.be>
- Date: Sun, 9 Sep 2018 20:20:21 +0000
- To: Kingsley Idehen <kidehen@openlinksw.com>
- CC: "public-lod@w3.org" <public-lod@w3.org>
Hi Kingsley, > How about: > > Virtuoso, SAGE client+server solution, TPF etc.. are classes of solutions that support the SPARQL Query Language which doesn't imply the SPARQL Protocol (which is associated with "SPARQL Endpoints"). Fine with me. > Yes, distinguishing these things is very important. The general habit of using a single name in the most generic form is highly problematic. Let’s instead use URIs from now on ;-) https://www.w3.org/TR/sparql11-query/ https://www.w3.org/TR/sparql11-protocol/ > I believe in "horses for courses" i.e., use the best tool for the problem at hand. I am a strong believer in the notion of "Small Data" as THE vehicle for full appreciation of what Linked Data brings to RDF, despite the tendency to leverage Virtuoso as a vehicle for mass loading of datasets +1 for small data here; I’m interested in querying large numbers of small datasets as opposed to querying low numbers of large datasets. That said, SPARQL endpoints as interfaces to small datasets might likely be feasible data-wise; unless such small datasets are on constrained devices perhaps. >> I do see a usage for the SPARQL Protocol in closed networks. > > I don't see it as a "closed networks" thing. We implemented "Anytime Query" functionality so that using it in public is feasible, as we've demonstrated for many years across several endpoints e.g., DBpedia, Uniprot, URIBurner, LOD Cloud Cache, and many others. “Possible” is, I guess, a function of the server infrastructure that you need for it. Probably SAGE would claim to achieve more with the same server hardware, given that SAGE also leverages some client-side CPU. > I cannot accept your position about "closed networks" confinement for the SPARQL Protocol when I know what we have and how we tackled the fundamental challenge [1][2][3]. Then I would—honestly—like to understand what stops people from deploying SPARQL endpoints. We have many more RDF datasets than SPARQL endpoints. My personal guess would be a mixture of fear for high server load (and downtime when not met), as well as usability issues during setup. But that is just speculation. Would a survey be useful here? > you are closing the door on the issue we actually solved via the implementation of our "Anytime Query" feature, which is proven by the live instances that we have in place. Clearly, some open issues still remain that stop people from installing them en masse. We need to remove those obstacles, but we need to know what they are first. With TPF and SAGE, we seem to have assumed that server (over)load is the main problem. > We set it a 120 secs on DBpedia specifically in line with the "Fair Use" requirements of that particular instance. That threshold is configurable, which is the crux of the matter re. Virtuoso. This is were it gets really interesting, because SAGE literally allows any query, even an open ?s ?p ?o. It does not appear necessary to a put a fair use guard. Best, Ruben
Received on Sunday, 9 September 2018 20:20:47 UTC