- From: Kingsley Idehen <kidehen@openlinksw.com>
- Date: Sun, 9 Sep 2018 17:39:52 -0400
- To: public-lod@w3.org
- Message-ID: <366385ba-ed3e-75db-5dc6-03a189b6e9a4@openlinksw.com>
On 9/9/18 4:20 PM, Ruben Verborgh (UGent-imec) wrote: > Hi Kingsley, > >> How about: >> >> Virtuoso, SAGE client+server solution, TPF etc.. are classes of solutions that support the SPARQL Query Language which doesn't imply the SPARQL Protocol (which is associated with "SPARQL Endpoints"). > Fine with me. Ok. > >> Yes, distinguishing these things is very important. The general habit of using a single name in the most generic form is highly problematic. > Let’s instead use URIs from now on ;-) > https://www.w3.org/TR/sparql11-query/ > https://www.w3.org/TR/sparql11-protocol/ Yep, but the concept URIs would be https://www.w3.org/TR/sparql11-query#this https://www.w3.org/TR/sparql11-protocol#this :) Can even hyperlink the literals SPARQL Query Language <https://www.w3.org/TR/sparql11-query#this> and SPARQL Protocol <https://www.w3.org/TR/sparql11-protocol#this> circa., 2018 . > >> I believe in "horses for courses" i.e., use the best tool for the problem at hand. I am a strong believer in the notion of "Small Data" as THE vehicle for full appreciation of what Linked Data brings to RDF, despite the tendency to leverage Virtuoso as a vehicle for mass loading of datasets > +1 for small data here; > I’m interested in querying large numbers of small datasets > as opposed to querying low numbers of large datasets. Yep! That's what Linked Data is supposed to be about, fundamentally. Unfortunately, its core essence has been lost by the overemphasis on massive datasets loaded into a DBMS or Store that offers SPARQL Query Language <https://www.w3.org/TR/sparql11-query#this> access via the SPARQL Protocol <https://www.w3.org/TR/sparql11-protocol#this>. > > That said, SPARQL endpoints as interfaces to small datasets > might likely be feasible data-wise; > unless such small datasets are on constrained devices perhaps. Yes they are. Virtuoso has been used in IoT efforts too, it is actually very small in size, but that's often lost in the "Big Data" dialogue and usage pattern. Remember, a SPARQL Language implementation can include de-reference of variables and constants used in the body of a sparql query. Virtuoso has always offered that [1]. > >>> I do see a usage for the SPARQL Protocol in closed networks. >> I don't see it as a "closed networks" thing. We implemented "Anytime Query" functionality so that using it in public is feasible, as we've demonstrated for many years across several endpoints e.g., DBpedia, Uniprot, URIBurner, LOD Cloud Cache, and many others. > “Possible” is, I guess, a function of the server infrastructure > that you need for it. > Probably SAGE would claim to achieve more > with the same server hardware, > given that SAGE also leverages some client-side CPU. Maybe, but that isn't "Apples vs Apples" since the same client could talk to any server (Virtuoso or others) etc.. In reading their paper, they are claiming to offer functionality similar to what we call "Anytime Query" but not understanding the importance of a live instance for comparative testing (since live access it utterly unpredictable across many critical dimensions). I also notice their paper doesn't publish configuration file settings for virtuoso which doesn't shed any light on the degree to which vectorized execution of queries is being used i.e., associating arrays of query executions with a thread pool. > >> I cannot accept your position about "closed networks" confinement for the SPARQL Protocol when I know what we have and how we tackled the fundamental challenge [1][2][3]. > Then I would—honestly—like to understand > what stops people from deploying SPARQL endpoints. For starters, everyone isn't using Virtuoso. > We have many more RDF datasets than SPARQL endpoints. Yes, but that's due to the point above combined with this "Big Data" over "Small Data" focus re exploitation of Linked Data principles. All we can do to rectify this problem is get a "Small Data" meme going that demonstrates why it is much closer to TimBL's Linked Data principles than the "Big Data" oriented implementations that have dominated the general dialog to date, unfortunately. > > My personal guess would be a mixture of > fear for high server load (and downtime when not met), > as well as usability issues during setup. > But that is just speculation. > Would a survey be useful here? No. "Small Data" meme is more useful, IMHO [3][4][5][6]. > >> you are closing the door on the issue we actually solved via the implementation of our "Anytime Query" feature, which is proven by the live instances that we have in place. > Clearly, some open issues still remain > that stop people from installing them en masse. > We need to remove those obstacles, > but we need to know what they are first. "Small Data" meme is what would lead us to solution here. > > With TPF and SAGE, we seem to have assumed > that server (over)load is the main problem. > >> We set it a 120 secs on DBpedia specifically in line with the "Fair Use" requirements of that particular instance. That threshold is configurable, which is the crux of the matter re. Virtuoso. > This is were it gets really interesting, > because SAGE literally allows any query, > even an open ?s ?p ?o. > It does not appear necessary to a put a fair use guard. "Fair Use" guard is required if you want to offer ad-hoc access to any combination of humans and machines, as a FREE Service. Which also brings me back to the live endpoint issue re. SPARQL Queries and Linked Data Deployment. Links: [1] http://docs.openlinksw.com/virtuoso/rdfinputgrab_01/ -- docs section about pragmas for de-referencing constants and variables in sparql query body [2] http://tinyurl.com/y8vj953w -- SPARQL Query scoped to data in local cache (crawling pragmas commented) [3] https://tinyurl.com/y92hcxkf -- ditto, but pragma for crawling and overwriting cache enabled [4] https://tinyurl.com/ybpk7kf9 -- ditto crawling across a few distinct solid-pods (classic "Small Data" exemplars) [5] https://tinyurl.com/y88g5dqv -- Query Definition that lets you easily add or remove RDF data sources crawled by the query > > Best, > > Ruben -- Regards, Kingsley Idehen Founder & CEO OpenLink Software (Home Page: http://www.openlinksw.com) Weblogs (Blogs): Legacy Blog: http://www.openlinksw.com/blog/~kidehen/ Blogspot Blog: http://kidehen.blogspot.com Medium Blog: https://medium.com/@kidehen Profile Pages: Pinterest: https://www.pinterest.com/kidehen/ Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen Twitter: https://twitter.com/kidehen Google+: https://plus.google.com/+KingsleyIdehen/about LinkedIn: http://www.linkedin.com/in/kidehen Web Identities (WebID): Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
Attachments
- application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Sunday, 9 September 2018 21:40:18 UTC