- From: Holger Knublauch <holger@topquadrant.com>
- Date: Fri, 05 Jun 2015 08:34:11 +1000
- To: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>
- CC: Tom Johnson <johnson.tom@gmail.com>, Corey A Harper <corey.harper@nyu.edu>
- Message-ID: <5570D263.80002@topquadrant.com>
On 6/5/2015 3:57, Tom Johnson wrote: > Hi Holger (and all), > > I'm a librarian with the Digital Public Library of America, a > sometimes participant of Karen Coyle's Dublin Core Application > Profiles group, and a likely SHACL implementer for `ruby-rdf` [1]. > I've been lurking this list for a while, and now seems as good a time > as any to jump in. > > This looks really good to me, and resolves a lot of my concerns about > the approaches discussed previously. A couple of comments are inline. > > On Wed, Jun 3, 2015 at 4:16 PM, Holger Knublauch > <holger@topquadrant.com <mailto:holger@topquadrant.com>> wrote: > > I thought more about the issue of generic scopes and filters and > have come up with a variation of Peter's design. Assuming we define > > - Scope: takes a graph as input and produces bindings for the > focus node (?this) > > Graph -> focus nodes > > - Constraint: that takes a focus node as input and produces > (violation) results: > > focus nodes -> results > > > I think "Constraint" has to be defined in terms of `Graph, FocusNodes > -> Results`, yes? The scope selects the focus nodes from Graph, but > the question answered by a constraint validation is whether the > triples in Graph (as opposed to the triples in the universe) fulfill > the constraint. Am I correct that this is the intent? Yes, the validation in general receives a dataset with a default graph as a parameter, and this is implicit in all these operations. There is another parameter which is the shapes graph (which may or may not be the same as the default graph). > > I'm interested in the question of how Graph can be selected; sometimes > when we have discussed "Scope" in the Dublin Core group, this is what > we mean. The language is starting to clarify for me now, and I guess > my questions here are: > > - In a SPARQL context, can we understand Graph to be the union > graph of all the graphs included in the dataset (i.e. G0 U G1 U ... GN)? Not necessarily. In some datasets, the default graph is in fact the union of all graphs, but we cannot rely on this fact. The default graph may be disjoint from all other graphs. If someone wants to execute SHACL over a union graph, then this needs to be prepared outside of the language, e.g. with a virtual union graph such as Jena's MultiUnion which then becomes the default graph. > - In other contexts, does SHACL need mechanisms for selecting > graphs from web sources? > - If I want to apply a shape to, e.g., an LDP-RS, can I select > Graph to be the one at its URI? > - What if I want to select Graph as multiple sources; e.g. an > LDP-RS + some related data in a non-LDP triplestore, or an LDP-RS + a > specific Linked Data Fragment? > - Assuming that we need such a Graph selection mechanism, can it be > squared with the SPARQL and RDF Dataset concepts, so there only needs > to be one set of formal concepts? I think SPARQL Datasets answer all this, by leaving this completely open to the implementation. It is quite possible to have a dataset that dynamically loads a graph when accessed. I believe this topic is outside of the scope of the SHACL spec, although we may have to cover it in informative sections or primers. > > I think we should make Scopes an explicit concept in SHACL's RDF > vocabulary, similar to how shapes are defined. There would be the > following class hierarchy: > > sh:Scope > sh:NativeScope > sh:TemplateScope > > And native scopes can have sh:sparql (or a JS body etc). Example > > # Applies to all subjects that have a skos:prefLabel > ex:MyShape > sh:scope [ > a sh:NativeScope ; # Optional rdf:type triple > sh:sparql """ > SELECT DISTINCT ?this > WHERE { > ?this skos:prefLabel ?any > } > """ > ] ; > sh:constraint [ > a ex:UniqueLanguageConstraint ; > ex:predicate skos:prefLabel ; > ] . > > This (common) case above could be turned into a template > sh:PropertyScope: > > ex:MyShape > sh:scope [ > a sh:PropertyScope ; > sh:predicate skos:prefLabel . > ] ; > sh:constraint [ > a ex:UniqueLanguageConstraint ; > ex:predicate skos:prefLabel ; > ] . > > and we could provide a small collection of frequently needed > scopes, e.g. > > - all nodes in a graph > - all subjects > - all nodes with any rdf:type > - all IRI nodes from a given namespace > > > This all looks good to me. > > You have, here, the ability to define scopes without reference to > class. This is important for some simple uses cases I have like "check > that all resources with edm:provider x have shape y", and I don't > think it was possible under previous proposals, even with inference. > > These scopes are also much more self contained than the examples in > Peter's message, making it much easier to define portable constraints. > We could even consider possibilities surrounding constraint/scope > inheritance for Shape class hierarchies. > > > Systems that don't speak SPARQL would rely on the hard-coded IRIs > from the core vocabulary, such as sh:PropertyScope. > > > I'm concerned about this last line. If systems that don't speak > SPARQL need to rely on the core IRI scopes, those systems will have > fairly limited functionality. I think it really behooves us to dig > deeper on the idea of sh:TemplateScope; I can't see any reason that > the vast bulk of Basic Graph Patterns + filters for scoping couldn't > be defined directly in terms of the SHACL vocabulary. I believe that there will be a relatively small collection of scopes that will cover 95% of use cases. I enumerated examples. Like with the choice of built-in constraint templates in the "core" language, we need to make a choice about what to include and what not. It is the responsibility of the implementers to cover as much as possible. If some engines do not want to support any of the extension languages then they have to explain this choice to their users. SPARQL engines exist for all major RDF platforms, so I personally don't "get" the fascination about dropping such a useful language feature. I would even argue that the implementation becomes much simpler if you can let the declarations from the shacl.ttl file do all the work instead of hard-coding them into another engine. But that is a longer topic that is not worth reopening... > > I'd also suggest inverting the language, making scopes defined without > recourse to SPARQL "native", and calling scopes defined in SPARQL > `sh:SparqlScope` or similar. In my design this would work exactly in the same way that Shape templates work. People can either use sh:sparql or any other extension language. This way, it is quite possible that the "core" scopes have multiple implementations, for the various target languages such as JavaScript. All that matters is that the scopes produce nodes. Thanks for your feedback, please follow up with clarification requests if I was too brief. Holger > > We could now also formally define the scope behind sh:scopeClass > (and sh:nodeShape): > > sh:ClassScope > a sh:TemplateScope ; > sh:argument [ > sh:predicate sh:class ; # Becomes ?class > sh:valueType rdfs:Class ; > ] ; > sh:sparql """ > SELECT ?this > WHERE { > ?type rdfs:subClassOf* ?class . > ?this a ?type . > } > """ . > > > In addition to these scopes, I suggest we turn sh:scopeShape into > sh:filterShape, and use these filters as pre-conditions that are > evaluated for a given set of focus nodes. The workflow then becomes: > > - sh:scope produces bindings for ?this > - sh:filterShape filters out the values of ?this that do not > match the given shape > - the actual constraints are evaluated > > I believe this design provides the flexibility of a generic > scoping mechanism (as suggested in Peter's design) without getting > into the complexity of having to analyze SPARQL syntax or rely on > hacks with rdfs:Resource, while having a user-friendly syntax. The > fact that we separate sh:Scope from sh:Shape means that we can > enforce different, explicit semantics on scopes. For example we > could allow a sh:Scope to encapsulate another SPARQL query that > tests whether a given ?this is in scope, i.e. the inverse > direction of the SELECT query, to optimize performance. > > Thanks, > Holger > > > Best, > > -- > Tom Johnson > Metadata & Platform Architect > Digital Public Library of America > tom@dp.la <mailto:tom@dp.la>
Received on Thursday, 4 June 2015 22:36:22 UTC