Re: Release of SAGE 1.0: a stable, responsive and unrestricted SPARQL query server from Kingsley Idehen on 2018-09-09 (public-lod@w3.org from September 2018)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Sun, 9 Sep 2018 15:03:12 -0400
To: public-lod@w3.org
Message-ID: <722af6d2-4271-5344-1f61-f644c9b68656@openlinksw.com>
On 9/9/18 12:55 PM, Ruben Verborgh (UGent-imec) wrote:
> Hi Kingsley,
>
>> Yes, and that means it isn't an "apples to apples" comparison.
> Exactly. See it rather as a questioning of apples instead.

Possibly.

>
>> Thus, that important
>> context cannot be dislocated or cloaked in any comparison about SPARQL
>> endpoints.
>
> If we take it as a comparison of SPARQL endpoints,
> then it’s not a fair one indeed.

Exactly!

It came across to me as a comparison of SPARQL endpoints, compounded by
the Virtuoso reference.

>
> However, we should take this as a comparison of
> “complex ecosystem in which SPARQL queries
>  can be executed by multiple actors”.
>
> So see the SAGE client+server solution
> as a system that processes SPARQL queries,
> and see Virtuoso as one representative
> of a specific class of such systems that do so,
> namely SPARQL endpoints.

How about:

Virtuoso, SAGE client+server solution, TPF etc.. are classes of
solutions that support the SPARQL Query Language which doesn't imply the
SPARQL Protocol (which is associated with "SPARQL Endpoints").

>
>> Okay, which makes this matter more confusing regarding its use of SPARQL
>> as its prime context, and additional reference to Virtuoso.
> Yes; I’ve made it a habit to always refer to
> “the SPARQL query language” and “the SPARQL protocol”.

Yes, distinguishing these things is very important. The general habit of
using a single name in the most generic form is highly problematic.

> I consider the former to be a great thing for the public Web,
> the latter not so much.

I believe in "horses for courses" i.e., use the best tool for the
problem at hand. I am a strong believer in the notion of "Small Data" as
THE vehicle for full appreciation of what Linked Data brings to RDF,
despite the tendency to leverage Virtuoso as a vehicle for mass loading
of datasets exposed to the Web via Linked Data documents and a SPARQL
Query Service Endpoint.

Using SPARQL as an intelligent crawling tool for a Semantic Web of
Linked Data is part of the ecosystem. In the case of Virtuoso, that's
been achievable from the day one albeit generally overlooked [1].


> It’s very confusing that both things are named “SPARQL”,
> and that this double definition was baked into the acronym.

Which is a defect rather than a feature. Communications clarity has
challenged RDF, Linked Data, SPARQL, and the notion of a Semantic Web
forever. 
>
>>> It was my belief back then—and still is—that the SPARQL protocol
>>> is unfit as an API for the public Web.
>> I don't really know what an API for the public Web means.
> I mean that the SPARQL Protocol is, in my opinion,
> unfit as an API to provide reliable access to public data,
> because it is very expressive and hence expensive for one side.
> I knew few (if none) other APIs that require similar server efforts.
>
> I do see a usage for the SPARQL Protocol in closed networks.

I don't see it as a "closed networks" thing.  We implemented "Anytime
Query" functionality so that using it in public is feasible, as we've
demonstrated for many years across several endpoints e.g., DBpedia,
Uniprot, URIBurner, LOD Cloud Cache, and many others.  That's why I
always ask to solutions presented as alternatives to what we offer to
include live endpoints, since the test case is a SPARQL Protocol
Endpoint that can handle the unpredictable query patterns and query
solution payloads that are an integral part of public access.

I cannot accept your position about "closed networks" confinement for
the SPARQL Protocol when I know what we have and how we tackled the
fundamental challenge [1][2][3].
>
>>> So rather than saying that the lack of SPARQL Protocol support
>>> is a weakness of SAGE, I would say it’s a strength.
>> I am not claiming that SPARQL Protocol support is a SAGE weakness.
> The moment they support the SPARQL Protocol
> is the moment they will have similar problems
> as other servers supporting the SPARQL protocol.

Modulo Virtuoso!

>
> What I have been arguing with the Linked Data Fragments research,
> is that the problems with public SPARQL endpoints
> are inherent due to their choice of interface, not their implementation.

I disagree.

Why? Because you are closing the door on the issue we actually solved
via the implementation of our "Anytime Query" feature, which is proven
by the live instances that we have in place.

>
> SAGE proposes a client/server ecosystem
> that is able to evaluate SPARQL queries,
> and because of their different choice of interface/protocol,
> they are able to do so with different performance characteristics.

And so can we it we are doing a like for like comparison. I assume you
know that timeouts are configurable as part of our "Anytime Query"
feature. We set it a 120 secs on DBpedia specifically in line with the
"Fair Use" requirements of that particular instance. That threshold is
configurable, which is the crux of the matter re. Virtuoso.

>
>> I am
>> trying to understand the context of their audacious claims that
>> reference Virtuoso i.e., "apples vs apples" rather than "apples vs
>> oranges” .
> In order to follow the comparison,
> consider SPARQL endpoints and their clients
> as ecosystems evaluating queries.
> Then it’s apples versus apples.

Not if it is assumed that query timeouts are fixed re what informs our
"Anytime Query" feature, for a given instance. One live instance of
Virtuoso (e.g., the Free DBpedia instance and its "Fair Use" config) is
not every instance of Virtuoso.

We thought deep and hard about the ability to implement configurable
timeouts as an integral part of our query processor for both SQL and
SPARQL. It is a DBMS technology game-changer demonstrably exploitable
across "public networks" :)

Links:

[1] https://tinyurl.com/ybz8juto -- from a collection of 30 Billion+
triples, how companies breakdown by industry group

[2] https://http://tinyurl.com/y7ome6py  -- ditto by Job Title

[3] https://tinyurl.com/y9ulxrzq -- ditto by Job Title and Qualifications
<https://t.co/b1bTb4X959>

>
> Best,
>
> Ruben


-- 
Regards,

Kingsley Idehen       
Founder & CEO 
OpenLink Software   (Home Page: http://www.openlinksw.com)

Weblogs (Blogs):
Legacy Blog: http://www.openlinksw.com/blog/~kidehen/
Blogspot Blog: http://kidehen.blogspot.com
Medium Blog: https://medium.com/@kidehen

Profile Pages:
Pinterest: https://www.pinterest.com/kidehen/
Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter: https://twitter.com/kidehen
Google+: https://plus.google.com/+KingsleyIdehen/about
LinkedIn: http://www.linkedin.com/in/kidehen

Web Identities (WebID):
Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
        : http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Sunday, 9 September 2018 19:03:38 UTC