Re: [ISSUE-62] A clean proposal with sh:Scope from Holger Knublauch on 2015-06-04 (public-data-shapes-wg@w3.org from June 2015)

From: Holger Knublauch <holger@topquadrant.com>
Date: Fri, 05 Jun 2015 08:34:11 +1000
To: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>
CC: Tom Johnson <johnson.tom@gmail.com>, Corey A Harper <corey.harper@nyu.edu>
Message-ID: <5570D263.80002@topquadrant.com>
On 6/5/2015 3:57, Tom Johnson wrote:
> Hi Holger (and all),
>
> I'm a librarian with the Digital Public Library of America, a 
> sometimes participant of Karen Coyle's Dublin Core Application 
> Profiles group, and a likely SHACL implementer for `ruby-rdf` [1].  
> I've been lurking this list for a while, and now seems as good a time 
> as any to jump in.
>
> This looks really good to me, and resolves a lot of my concerns about 
> the approaches discussed previously.  A couple of comments are inline.
>
> On Wed, Jun 3, 2015 at 4:16 PM, Holger Knublauch 
> <holger@topquadrant.com <mailto:holger@topquadrant.com>> wrote:
>
>     I thought more about the issue of generic scopes and filters and
>     have come up with a variation of Peter's design. Assuming we define
>
>     - Scope: takes a graph as input and produces bindings for the
>     focus node (?this)
>
>         Graph -> focus nodes
>
>     - Constraint: that takes a focus node as input and produces
>     (violation) results:
>
>         focus nodes -> results
>
>
> I think "Constraint" has to be defined in terms of `Graph, FocusNodes 
> -> Results`, yes?  The scope selects the focus nodes from Graph, but 
> the question answered by a constraint validation is whether the 
> triples in Graph (as opposed to the triples in the universe) fulfill 
> the constraint.  Am I correct that this is the intent?

Yes, the validation in general receives a dataset with a default graph 
as a parameter, and this is implicit in all these operations. There is 
another parameter which is the shapes graph (which may or may not be the 
same as the default graph).

>
> I'm interested in the question of how Graph can be selected; sometimes 
> when we have discussed "Scope" in the Dublin Core group, this is what 
> we mean.  The language is starting to clarify for me now, and I guess 
> my questions here are:
>
>    - In a SPARQL context, can we understand Graph to be the union 
> graph of all the graphs included in the dataset (i.e. G0 U G1 U ... GN)?

Not necessarily. In some datasets, the default graph is in fact the 
union of all graphs, but we cannot rely on this fact. The default graph 
may be disjoint from all other graphs. If someone wants to execute SHACL 
over a union graph, then this needs to be prepared outside of the 
language, e.g. with a virtual union graph such as Jena's MultiUnion 
which then becomes the default graph.

>    - In other contexts, does SHACL need mechanisms for selecting 
> graphs from web sources?
>       - If I want to apply a shape to, e.g., an LDP-RS, can I select 
> Graph to be the one at its URI?
>       - What if I want to select Graph as multiple sources; e.g. an 
> LDP-RS + some related data in a non-LDP triplestore, or an LDP-RS + a 
> specific Linked Data Fragment?
>    - Assuming that we need such a Graph selection mechanism, can it be 
> squared with the SPARQL and RDF Dataset concepts, so there only needs 
> to be one set of formal concepts?

I think SPARQL Datasets answer all this, by leaving this completely open 
to the implementation. It is quite possible to have a dataset that 
dynamically loads a graph when accessed. I believe this topic is outside 
of the scope of the SHACL spec, although we may have to cover it in 
informative sections or primers.

>
>     I think we should make Scopes an explicit concept in SHACL's RDF
>     vocabulary, similar to how shapes are defined. There would be the
>     following class hierarchy:
>
>     sh:Scope
>         sh:NativeScope
>         sh:TemplateScope
>
>     And native scopes can have sh:sparql (or a JS body etc). Example
>
>     # Applies to all subjects that have a skos:prefLabel
>     ex:MyShape
>         sh:scope [
>             a sh:NativeScope ; # Optional rdf:type triple
>             sh:sparql """
>                     SELECT DISTINCT ?this
>                     WHERE {
>                         ?this skos:prefLabel ?any
>                     }
>                 """
>         ] ;
>         sh:constraint [
>             a ex:UniqueLanguageConstraint ;
>             ex:predicate skos:prefLabel ;
>         ] .
>
>     This (common) case above could be turned into a template
>     sh:PropertyScope:
>
>     ex:MyShape
>         sh:scope [
>             a sh:PropertyScope ;
>             sh:predicate skos:prefLabel .
>         ] ;
>         sh:constraint [
>             a ex:UniqueLanguageConstraint ;
>             ex:predicate skos:prefLabel ;
>         ] .
>
>     and we could provide a small collection of frequently needed
>     scopes, e.g.
>
>     - all nodes in a graph
>     - all subjects
>     - all nodes with any rdf:type
>     - all IRI nodes from a given namespace
>
>
> This all looks good to me.
>
> You have, here, the ability to define scopes without reference to 
> class. This is important for some simple uses cases I have like "check 
> that all resources with edm:provider x have shape y", and I don't 
> think it was possible under previous proposals, even with inference.
>
> These scopes are also much more self contained than the examples in 
> Peter's message, making it much easier to define portable constraints. 
> We could even consider possibilities surrounding constraint/scope 
> inheritance for Shape class hierarchies.
>
>
>     Systems that don't speak SPARQL would rely on the hard-coded IRIs
>     from the core vocabulary, such as sh:PropertyScope.
>
>
> I'm concerned about this last line.  If systems that don't speak 
> SPARQL need to rely on the core IRI scopes, those systems will have 
> fairly limited functionality.  I think it really behooves us to dig 
> deeper on the idea of sh:TemplateScope; I can't see any reason that 
> the vast bulk of Basic Graph Patterns + filters for scoping couldn't 
> be defined directly in terms of the SHACL vocabulary.

I believe that there will be a relatively small collection of scopes 
that will cover 95% of use cases. I enumerated examples. Like with the 
choice of built-in constraint templates in the "core" language, we need 
to make a choice about what to include and what not. It is the 
responsibility of the implementers to cover as much as possible. If some 
engines do not want to support any of the extension languages then they 
have to explain this choice to their users. SPARQL engines exist for all 
major RDF platforms, so I personally don't "get" the fascination about 
dropping such a useful language feature. I would even argue that the 
implementation becomes much simpler if you can let the declarations from 
the shacl.ttl file do all the work instead of hard-coding them into 
another engine. But that is a longer topic that is not worth reopening...

>
> I'd also suggest inverting the language, making scopes defined without 
> recourse to SPARQL "native", and calling scopes defined in SPARQL 
> `sh:SparqlScope` or similar.

In my design this would work exactly in the same way that Shape 
templates work. People can either use sh:sparql or any other extension 
language. This way, it is quite possible that the "core" scopes have 
multiple implementations, for the various target languages such as 
JavaScript. All that matters is that the scopes produce nodes.

Thanks for your feedback, please follow up with clarification requests 
if I was too brief.
Holger

>
>     We could now also formally define the scope behind sh:scopeClass
>     (and sh:nodeShape):
>
>     sh:ClassScope
>         a sh:TemplateScope ;
>         sh:argument [
>             sh:predicate sh:class ;   # Becomes ?class
>             sh:valueType rdfs:Class ;
>         ] ;
>         sh:sparql """
>                 SELECT ?this
>                 WHERE {
>                     ?type rdfs:subClassOf* ?class .
>                     ?this a ?type .
>                 }
>             """ .
>
>
>     In addition to these scopes, I suggest we turn sh:scopeShape into
>     sh:filterShape, and use these filters as pre-conditions that are
>     evaluated for a given set of focus nodes. The workflow then becomes:
>
>         - sh:scope produces bindings for ?this
>         - sh:filterShape filters out the values of ?this that do not
>     match the given shape
>         - the actual constraints are evaluated
>
>     I believe this design provides the flexibility of a generic
>     scoping mechanism (as suggested in Peter's design) without getting
>     into the complexity of having to analyze SPARQL syntax or rely on
>     hacks with rdfs:Resource, while having a user-friendly syntax. The
>     fact that we separate sh:Scope from sh:Shape means that we can
>     enforce different, explicit semantics on scopes. For example we
>     could allow a sh:Scope to encapsulate another SPARQL query that
>     tests whether a given ?this is in scope, i.e. the inverse
>     direction of the SELECT query, to optimize performance. 
>
>     Thanks,
>     Holger
>
>
> Best,
>
> --
> Tom Johnson
> Metadata & Platform Architect
> Digital Public Library of America
> tom@dp.la <mailto:tom@dp.la>
Received on Thursday, 4 June 2015 22:36:22 UTC