Re: Release of SAGE 1.0: a stable, responsive and unrestricted SPARQL query server from Kingsley Idehen on 2018-09-09 (public-lod@w3.org from September 2018)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Sun, 9 Sep 2018 17:39:52 -0400
To: public-lod@w3.org
Message-ID: <366385ba-ed3e-75db-5dc6-03a189b6e9a4@openlinksw.com>
On 9/9/18 4:20 PM, Ruben Verborgh (UGent-imec) wrote:
> Hi Kingsley,
>
>> How about:
>>
>> Virtuoso, SAGE client+server solution, TPF etc.. are classes of solutions that support the SPARQL Query Language which doesn't imply the SPARQL Protocol (which is associated with "SPARQL Endpoints"). 
> Fine with me.

Ok.

>
>> Yes, distinguishing these things is very important. The general habit of using a single name in the most generic form is highly problematic. 
> Let’s instead use URIs from now on ;-)
> https://www.w3.org/TR/sparql11-query/
> https://www.w3.org/TR/sparql11-protocol/

Yep, but the concept URIs would be

https://www.w3.org/TR/sparql11-query#this
https://www.w3.org/TR/sparql11-protocol#this

:) 


Can even hyperlink the literals SPARQL Query Language
<https://www.w3.org/TR/sparql11-query#this> and SPARQL Protocol
<https://www.w3.org/TR/sparql11-protocol#this> circa., 2018 .

>
>> I believe in "horses for courses" i.e., use the best tool for the problem at hand. I am a strong believer in the notion of "Small Data" as THE vehicle for full appreciation of what Linked Data brings to RDF, despite the tendency to leverage Virtuoso as a vehicle for mass loading of datasets
> +1 for small data here;
> I’m interested in querying large numbers of small datasets
> as opposed to querying low numbers of large datasets.

Yep!

That's what Linked Data is supposed to be about, fundamentally.
Unfortunately, its core essence has been lost by the overemphasis on
massive datasets loaded into a DBMS or Store that offers SPARQL Query
Language <https://www.w3.org/TR/sparql11-query#this> access via the
SPARQL Protocol <https://www.w3.org/TR/sparql11-protocol#this>.

>
> That said, SPARQL endpoints as interfaces to small datasets
> might likely be feasible data-wise;
> unless such small datasets are on constrained devices perhaps.

Yes they are. Virtuoso has been used in IoT efforts too, it is actually
very small in size, but that's often lost in the "Big Data" dialogue and
usage pattern.

Remember, a SPARQL Language implementation can include de-reference of
variables and constants used in the body of a sparql query. Virtuoso has
always offered that [1].

>
>>> I do see a usage for the SPARQL Protocol in closed networks.
>> I don't see it as a "closed networks" thing.  We implemented "Anytime Query" functionality so that using it in public is feasible, as we've demonstrated for many years across several endpoints e.g., DBpedia, Uniprot, URIBurner, LOD Cloud Cache, and many others.
> “Possible” is, I guess, a function of the server infrastructure
> that you need for it.
> Probably SAGE would claim to achieve more
> with the same server hardware,
> given that SAGE also leverages some client-side CPU.

Maybe, but that isn't "Apples vs Apples" since the same client could
talk to any server (Virtuoso or others) etc..
In reading their paper, they are claiming to offer functionality similar
to what we call "Anytime Query" but not understanding the importance of
a live instance for comparative testing (since live access it utterly
unpredictable across many critical dimensions).

I also notice their paper doesn't publish configuration file settings
for virtuoso which doesn't shed any light on the degree to which
vectorized execution of queries is being used i.e., associating arrays
of query executions with a thread pool.

>
>> I cannot accept your position about "closed networks" confinement for the SPARQL Protocol when I know what we have and how we tackled the fundamental challenge [1][2][3]. 
> Then I would—honestly—like to understand
> what stops people from deploying SPARQL endpoints.

For starters, everyone isn't using Virtuoso.

> We have many more RDF datasets than SPARQL endpoints.

Yes, but that's due to the point  above combined with this "Big Data"
over "Small Data" focus re exploitation of Linked Data principles. All
we can do to rectify this problem is get a "Small Data" meme going that
demonstrates why it is much closer to TimBL's Linked Data principles
than the "Big Data" oriented implementations that have dominated the
general dialog to date, unfortunately.

>
> My personal guess would be a mixture of
> fear for high server load (and downtime when not met),
> as well as usability issues during setup.
> But that is just speculation.
> Would a survey be useful here?

No.

"Small Data" meme is more useful, IMHO [3][4][5][6].

>
>> you are closing the door on the issue we actually solved via the implementation of our "Anytime Query" feature, which is proven by the live instances that we have in place.
> Clearly, some open issues still remain
> that stop people from installing them en masse.
> We need to remove those obstacles,
> but we need to know what they are first.

"Small Data" meme is what would lead us to solution here.

>
> With TPF and SAGE, we seem to have assumed
> that server (over)load is the main problem.
>
>> We set it a 120 secs on DBpedia specifically in line with the "Fair Use" requirements of that particular instance. That threshold is configurable, which is the crux of the matter re. Virtuoso. 
> This is were it gets really interesting,
> because SAGE literally allows any query,
> even an open ?s ?p ?o.
> It does not appear necessary to a put a fair use guard.

"Fair Use" guard is required if you want  to offer ad-hoc access to any
combination of humans and machines, as a FREE Service.

Which also brings me back to the live endpoint issue re. SPARQL Queries
and Linked Data Deployment.


Links:

[1] http://docs.openlinksw.com/virtuoso/rdfinputgrab_01/ -- docs section
about pragmas for de-referencing constants and variables in sparql query
body

[2] http://tinyurl.com/y8vj953w -- SPARQL Query scoped to data in local
cache (crawling pragmas commented)

[3] https://tinyurl.com/y92hcxkf -- ditto, but pragma for crawling and
overwriting cache enabled

[4] https://tinyurl.com/ybpk7kf9 -- ditto crawling across a few distinct
solid-pods (classic "Small Data" exemplars)

[5] https://tinyurl.com/y88g5dqv -- Query Definition that lets you
easily add or remove RDF data sources crawled by the query

>
> Best,
>
> Ruben


-- 
Regards,

Kingsley Idehen       
Founder & CEO 
OpenLink Software   (Home Page: http://www.openlinksw.com)

Weblogs (Blogs):
Legacy Blog: http://www.openlinksw.com/blog/~kidehen/
Blogspot Blog: http://kidehen.blogspot.com
Medium Blog: https://medium.com/@kidehen

Profile Pages:
Pinterest: https://www.pinterest.com/kidehen/
Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter: https://twitter.com/kidehen
Google+: https://plus.google.com/+KingsleyIdehen/about
LinkedIn: http://www.linkedin.com/in/kidehen

Web Identities (WebID):
Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
        : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Sunday, 9 September 2018 21:40:18 UTC