Re: [ISSUE-62] A clean proposal with sh:Scope from Tom Johnson on 2015-06-06 (public-data-shapes-wg@w3.org from June 2015)

From: Tom Johnson <johnson.tom@gmail.com>
Date: Sat, 6 Jun 2015 11:53:14 -0700
To: Holger Knublauch <holger@topquadrant.com>
Cc: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>, Corey A Harper <corey.harper@nyu.edu>
Message-ID: <CAJeHiNH39NsUNm3ChV_8khwGWagimmYU4mA37Yxr3YQrnWaYNg@mail.gmail.com>
On Thu, Jun 4, 2015 at 3:34 PM, Holger Knublauch <holger@topquadrant.com>
wrote:

>  On 6/5/2015 3:57, Tom Johnson wrote:
>
> Hi Holger (and all),
>
>  I'm a librarian with the Digital Public Library of America, a sometimes
> participant of Karen Coyle's Dublin Core Application Profiles group, and a
> likely SHACL implementer for `ruby-rdf` [1].  I've been lurking this list
> for a while, and now seems as good a time as any to jump in.
>
>  This looks really good to me, and resolves a lot of my concerns about
> the approaches discussed previously.  A couple of comments are inline.
>
>  On Wed, Jun 3, 2015 at 4:16 PM, Holger Knublauch <holger@topquadrant.com>
> wrote:
>
>> I thought more about the issue of generic scopes and filters and have
>> come up with a variation of Peter's design. Assuming we define
>>
>> - Scope: takes a graph as input and produces bindings for the focus node
>> (?this)
>>
>>     Graph -> focus nodes
>>
>> - Constraint: that takes a focus node as input and produces (violation)
>> results:
>>
>>     focus nodes -> results
>>
>
>  I think "Constraint" has to be defined in terms of `Graph, FocusNodes ->
> Results`, yes?  The scope selects the focus nodes from Graph, but the
> question answered by a constraint validation is whether the triples in
> Graph (as opposed to the triples in the universe) fulfill the constraint.
> Am I correct that this is the intent?
>
>
> Yes, the validation in general receives a dataset with a default graph as
> a parameter, and this is implicit in all these operations. There is another
> parameter which is the shapes graph (which may or may not be the same as
> the default graph).
>
>
Perfect, that's clarifying.

>
>  I'm interested in the question of how Graph can be selected; sometimes
> when we have discussed "Scope" in the Dublin Core group, this is what we
> mean.  The language is starting to clarify for me now, and I guess my
> questions here are:
>
>     - In a SPARQL context, can we understand Graph to be the union graph
> of all the graphs included in the dataset (i.e. G0 U G1 U ... GN)?
>
>
> Not necessarily. In some datasets, the default graph is in fact the union
> of all graphs, but we cannot rely on this fact. The default graph may be
> disjoint from all other graphs. If someone wants to execute SHACL over a
> union graph, then this needs to be prepared outside of the language, e.g.
> with a virtual union graph such as Jena's MultiUnion which then becomes the
> default graph.
>
>
Sorry, I think I wasn't very clear.  A lot of the literature on Datasets
defines them in terms of { G0, { { n1, G1}, ...{nn, Gn} } }, with G0 as the
default, nx as graph names, and G1 through Gn as the other graphs. So I was
indeed asking about the "MultiUnion" style graph, including the triples of
both the default and the named graphs.  I think I'm starting to get a
clearer picture of what you have in mind, though.

The algorithm is roughly:
  - a `validation` function that accepts an arbitrary graph (or a Dataset?
are filters and constraints that reference graph names in scope for SHACL?)
and a constraints graph (optionally, since the constraints may be extracted
from the other); and iterating over each shape calls:
    - a `scoping` function that accepts the graph to validate and a shape
sub-graph, outputting focus nodes.
  - then, iterating over each focus node and constraint, calls:
    - a `constraint` function that accepts a constraint sub-graph, a focus
node, and the original Graph, and outputs violations.

I think I still have some questions about how this would be implemented for
SPARQL-based stores; and guidance seems necessary since people will want to
validate more than just the default graph; divergent graph selection
approaches will lead to situations where, given an identical Dataset and
constraints graph, different endpoints produce different violation sets.
For my purposes in tackling this for RDF.rb, the above is fine, and the
client can be responsible for curating the input graph.

>    - In other contexts, does SHACL need mechanisms for selecting graphs
> from web sources?
>       - If I want to apply a shape to, e.g., an LDP-RS, can I select Graph
> to be the one at its URI?
>       - What if I want to select Graph as multiple sources; e.g. an LDP-RS
> + some related data in a non-LDP triplestore, or an LDP-RS + a specific
> Linked Data Fragment?
>    - Assuming that we need such a Graph selection mechanism, can it be
> squared with the SPARQL and RDF Dataset concepts, so there only needs to be
> one set of formal concepts?
>
>
> I think SPARQL Datasets answer all this, by leaving this completely open
> to the implementation. It is quite possible to have a dataset that
> dynamically loads a graph when accessed. I believe this topic is outside of
> the scope of the SHACL spec, although we may have to cover it in
> informative sections or primers.
>
>
>  I think we should make Scopes an explicit concept in SHACL's RDF
>> vocabulary, similar to how shapes are defined. There would be the following
>> class hierarchy:
>>
>> sh:Scope
>>     sh:NativeScope
>>     sh:TemplateScope
>>
>> And native scopes can have sh:sparql (or a JS body etc). Example
>>
>> # Applies to all subjects that have a skos:prefLabel
>> ex:MyShape
>>     sh:scope [
>>         a sh:NativeScope ; # Optional rdf:type triple
>>         sh:sparql """
>>                 SELECT DISTINCT ?this
>>                 WHERE {
>>                     ?this skos:prefLabel ?any
>>                 }
>>             """
>>     ] ;
>>     sh:constraint [
>>         a ex:UniqueLanguageConstraint ;
>>         ex:predicate skos:prefLabel ;
>>     ] .
>>
>> This (common) case above could be turned into a template sh:PropertyScope:
>>
>> ex:MyShape
>>     sh:scope [
>>         a sh:PropertyScope ;
>>         sh:predicate skos:prefLabel .
>>     ] ;
>>     sh:constraint [
>>         a ex:UniqueLanguageConstraint ;
>>         ex:predicate skos:prefLabel ;
>>     ] .
>>
>> and we could provide a small collection of frequently needed scopes, e.g.
>>
>> - all nodes in a graph
>> - all subjects
>> - all nodes with any rdf:type
>> - all IRI nodes from a given namespace
>>
>
>  This all looks good to me.
>
>  You have, here, the ability to define scopes without reference to class.
> This is important for some simple uses cases I have like "check that all
> resources with edm:provider x have shape y", and I don't think it was
> possible under previous proposals, even with inference.
>
>  These scopes are also much more self contained than the examples in
> Peter's message, making it much easier to define portable constraints. We
> could even consider possibilities surrounding constraint/scope inheritance
> for Shape class hierarchies.
>
>
>>
>> Systems that don't speak SPARQL would rely on the hard-coded IRIs from
>> the core vocabulary, such as sh:PropertyScope.
>
>
>  I'm concerned about this last line.  If systems that don't speak SPARQL
> need to rely on the core IRI scopes, those systems will have fairly limited
> functionality.  I think it really behooves us to dig deeper on the idea of
> sh:TemplateScope; I can't see any reason that the vast bulk of Basic Graph
> Patterns + filters for scoping couldn't be defined directly in terms of the
> SHACL vocabulary.
>
>
> I believe that there will be a relatively small collection of scopes that
> will cover 95% of use cases. I enumerated examples. Like with the choice of
> built-in constraint templates in the "core" language, we need to make a
> choice about what to include and what not. It is the responsibility of the
> implementers to cover as much as possible. If some engines do not want to
> support any of the extension languages then they have to explain this
> choice to their users. SPARQL engines exist for all major RDF platforms, so
> I personally don't "get" the fascination about dropping such a useful
> language feature. I would even argue that the implementation becomes much
> simpler if you can let the declarations from the shacl.ttl file do all the
> work instead of hard-coding them into another engine. But that is a longer
> topic that is not worth reopening...
>
>
I've clearly stepped in something, here, re: "fascination".  Sorry about
that. :)

Let me try and explain where I'm coming from.

First of all, I agree that defining constraints in terms of SPARQL is a
good way to go.  With SPARQL you have a well tested existing grammar, test
suites, etc... and providing translations from specific constraints *does*
mean that most implementations can cheaply implement constraints in those
terms using upstream translations.  That said, implementations that have
SPARQL engines generally have independant BGP engines, too.  I think it's
the case that the latter are cheaper to run, so there may be trade-offs
here.

Second, I have something of an aversion to the use of embedded
microsyntaxes in general.  In my view, the question is not so much about
dropping a useful language feature, but allowing the languages expressions
to be in their own syntax.  Recourse to external grammars in definitions
seems totally reasonable; extensions likewise.  But my feeling is that if
the language is meant to be expressed in RDF, it should be possible to
write custom constraints in RDF.  Extensions should be just that:
extensions.  This may come down to skepticism that 95% of constraints can
be pre-defined---people are creative. :)

Lastly, if I understand correctly the current draft makes embedded SPARQL
support optional (from recollection, I don't think it's even a SHOULD).
That seems to conflict very strongly with making embedded SPARQL the
primary way of defining custom constraints.  For what it's worth, I think
SPARQL support really ought to be a SHOULD or stronger.  Softer language is
at risk of delaying (or worse) portability for what seems like a key
feature, likely to be in widespread use.  "...they have to explain this
choice to their users" doesn't strike me as a solution to this problem.

>
>  I'd also suggest inverting the language, making scopes defined without
> recourse to SPARQL "native", and calling scopes defined in SPARQL
> `sh:SparqlScope` or similar.
>
>
> In my design this would work exactly in the same way that Shape templates
> work. People can either use sh:sparql or any other extension language. This
> way, it is quite possible that the "core" scopes have multiple
> implementations, for the various target languages such as JavaScript. All
> that matters is that the scopes produce nodes.
>
> Thanks for your feedback, please follow up with clarification requests if
> I was too brief.
> Holger
>
>

>
>  We could now also formally define the scope behind sh:scopeClass (and
>> sh:nodeShape):
>>
>> sh:ClassScope
>>     a sh:TemplateScope ;
>>     sh:argument [
>>         sh:predicate sh:class ;   # Becomes ?class
>>         sh:valueType rdfs:Class ;
>>     ] ;
>>     sh:sparql """
>>             SELECT ?this
>>             WHERE {
>>                 ?type rdfs:subClassOf* ?class .
>>                 ?this a ?type .
>>             }
>>         """ .
>
>
>> In addition to these scopes, I suggest we turn sh:scopeShape into
>> sh:filterShape, and use these filters as pre-conditions that are evaluated
>> for a given set of focus nodes. The workflow then becomes:
>>
>>     - sh:scope produces bindings for ?this
>>     - sh:filterShape filters out the values of ?this that do not match
>> the given shape
>>     - the actual constraints are evaluated
>>
>> I believe this design provides the flexibility of a generic scoping
>> mechanism (as suggested in Peter's design) without getting into the
>> complexity of having to analyze SPARQL syntax or rely on hacks with
>> rdfs:Resource, while having a user-friendly syntax. The fact that we
>> separate sh:Scope from sh:Shape means that we can enforce different,
>> explicit semantics on scopes. For example we could allow a sh:Scope to
>> encapsulate another SPARQL query that tests whether a given ?this is in
>> scope, i.e. the inverse direction of the SELECT query, to optimize
>> performance.
>
>
>
>> Thanks,
>> Holger
>>
>>
>  Best,
>
> --
> Tom Johnson
> Metadata & Platform Architect
> Digital Public Library of America
> tom@dp.la
>
>
>
Thanks, Holger, for taking the time to respond.  I'll try to stay engaged
when I can over the next few weeks.  I'm watching closely for the moment
when it's practical to take a hack at implementing; and would appreciate
especially if there could be a call for implementations when the group
thinks we've reached that point.

-- 
-Tom
Received on Saturday, 6 June 2015 18:54:23 UTC