Re: SHACL: A language for multiple platforms

To make sure we are talking about the same things, I am attaching a 
small illustration of two SPARQL-based engine scenarios.

The design on the left hand side is based on a Dataset that provides 
access to a collection of named graphs. These graphs may include 
in-memory graphs, virtual graphs, or database-backed graphs. The Dataset 
is accessed programmatically via a SPARQL engine that sits on the same 
machine as the engine. Popular choices here are access via the SPARQL 
APIs of Jena or Sesame. These SPARQL engines are quite easy to extend 
with new functions, provide the maximum of flexibility and operate on 
true node objects (including preserving blank node IDs) and have 
features to pre-bind variables in SPARQL queries. This is the design 
used in TopBraid server products, Fuseki and similar enterprise tools.

The design on the right hand side is based on a SPARQL endpoint, which 
also contains the SPARQL engine. This takes the SPARQL engine beyond the 
control of the SHACL engine, i.e. there is in general no way to add 
functions, no way to control which named graphs are accessible, no way 
to control inferencing. Due to the serialization of the SPARQL endpoint 
protocol, blank node IDs cannot be preserved. Variable pre-binding is 
not possible, everything needs to go through text serialization. This 
design is simpler but not as flexible as the other option. It can only 
provide a sub-set of the features of the dataset-based design.

These are entirely different setups, for different user groups. This is 
not a one-size-fits-all discussion. If the SPARQL endpoint people don't 
need access to ?shapesGraph, SHACL functions, recursion and blank nodes 
then fine for me, but they should not block the dataset people from 
covering their use cases.

Thanks,
Holger




On 6/19/2015 9:17, Holger Knublauch wrote:
> I believe before we can decide on things like ?shapesGraph access, we 
> need to clarify the spaces in which SHACL will be used. The platforms 
> and use cases will be very heterogeneous.
>
> a) Pure "core" engines that may not support any extension language
>    - some may include hard-coded support for certain templates (OSLC?)
>
> b) Engines that can operate against a SPARQL endpoint
>
> c) Engines that can operate against datasets, accessed via an API like 
> Jena
>    - with inferencing support
>    - without inferencing support
>
> d) Engines that operate on a client in JavaScript, without SPARQL support
>
> e) SPARQL-based engines that also support JavaScript, XML Schema etc.
>
> ...
>
> Not every one of them will be able to handle all SHACL files, e.g. b) 
> cannot handle recursion or blank nodes or ?shapesGraph. Some c) will 
> not handle queries requiring RDFS or OWL inferencing. However, a c) 
> can in principle handle SPARQL endpoints, by treating the endpoint as 
> a graph, so there is nothing wrong with it in principle.
>
> The situation is similar to OWL Lite/DL/Full. Being a consequence of 
> Design-by-committee, it is natural that not everyone will be perfectly 
> happy, and it leads to a certain level of mess by default.
>
> A consequence of this situation is that my proposal uses the "Full" 
> language as the basis of the spec, but includes error-handling for 
> cases that certain engines may not handle. Most features can also be 
> handled by engines that only cover subsets, e.g. sh:allowedValues can 
> be converted into a FILTER NOT IN ... for SPARQL endpoints, and to 
> whatever code the ShEx engine uses.
>
> My proposal remains to write down the SPARQL endpoint support as a 
> separate deliverable or chapter, similar to how ShEx people will write 
> down how their implementation works. This way, everyone gets the 
> features they need.
>
> Holger
>

Received on Friday, 19 June 2015 04:20:04 UTC