SPARQL Endpoint support from Holger Knublauch on 2015-07-03 (public-data-shapes-wg@w3.org from July 2015)

From: Holger Knublauch <holger@topquadrant.com>
Date: Fri, 03 Jul 2015 10:05:54 +1000
To: public-data-shapes-wg <public-data-shapes-wg@w3.org>
Message-ID: <5595D1E2.7060503@topquadrant.com>
The message from today's call was that we need to try to make progress 
on the ?shapesGraph question via email. So here we go again.

I believe we need to rephrase the question that we are trying to 
resolve. Instead of being about ?shapesGraph access only, this is more 
about how SHACL can interoperate best with SPARQL endpoints for external 
databases that have no direct Graph interface. Since every SPARQL 
endpoint can be turned into a virtual graph with SPO queries, the 
question can be further narrowed down how to *optimize* performance 
against SPARQL endpoints, i.e. get along with as few queries as 
possible. My proposal for this is ISSUE-71, i.e. a network protocol that 
just requires a single transaction. Assuming we agree this is useful, 
the remaining question becomes how to make sure that SPARQL endpoints 
that do not support the SHACL protocol yet have decent performance. And 
of those use cases, we only talk about scenarios where (for some reason) 
the person issuing the constraints is not able to control which 
constraints get written - nobody forces you to use ?shapesGraph or 
user-defined functions in your queries for example.

All this is necessarily only for a subset of SHACL. For example SPARQL 
endpoints have no notion of other execution languages like JavaScript. 
There is no mechanism to declare new functions on SPARQL endpoints. 
There is no way to ask a SPARQL endpoint to validate a given blank node 
that was previously returned (because the IDs are different each time). 
Recursion is highly questionable - I remember Arthur stating that 
recursion doesn't require shapes graph access, but then I assume you are 
back to making multiple calls to the SPARQL endpoint, not a single query 
(please send details). There is no way to control which named graphs are 
accessible from the endpoint, or even whether the endpoint supports all 
of the required SPARQL features and entailments. And we would need to 
either disallow ?shapesGraph access in general or reject constraints 
that use it.

I cannot accept that this constrained environment shall dictate how the 
whole spec is written. Are these particular scenarios involving SPARQL 
endpoints really *that* important, especially given that there are 
plenty of alternative ways of talking to databases? Also note that many 
people have moved away from SPARQL endpoints due to their infamous 
unreliability.

Anyway, since we are already talking about a small slice of possible 
architectures, I think the best way forward is to
- Make that subset explicit, e.g. call it SHACL-SE
- Put warning signs around features that are outside of SHACL-SE
- Thus minimize the risk that SPARQL endpoints have reduced performance 
because the SHACL engine may need to fallback to its own local SPARQL 
engine.

This is similar to Arnaud's option c), to make certain features 
optional, and we will be able to deliver a FPWD soon, without spending 
months on the drawing board with something like Peter's SPARQL-only 
proposal that has never been implemented or tested, has serious 
performance problems, covers only a subset of approved requirements and 
yet will lead to a vastly more complex spec because we need to look into 
all kinds of SPARQL string generation problems.

The situation has parallels with OWL DL versus Full. In OWL DL there was 
a group of vocal people arguing that we need to save the world from 
certain worst-case scenarios that break their algorithms. An OWL Full 
scare campaign followed. Up to this date, some tools refuse to process 
OWL Full. In practice however, many people use OWL Full features all the 
time without negative consequences. Thankfully, the OWL group at least 
permitted OWL Full to be explored on the market. We should allow the 
same evolution to happen here. If the SPARQL endpoint folks are serious 
about performance they should support the consistent architecture 
proposed in ISSUE-71 instead of taking away the flexibility for everyone 
else.

Could those who voted for option b) (no ?shapesGraph support) please 
explain why they cannot live with my proposal above? Nobody forces you 
to use ?shapesGraph in your SPARQL constraints (if you are using SPARQL 
at all), and it is easy to detect this variable in 3rd party queries, so 
what is the big deal here?

Thanks,
Holger

http://www.w3.org/2014/data-shapes/track/issues/71
Received on Friday, 3 July 2015 00:06:29 UTC