- From: RDF Data Shapes Working Group Issue Tracker <sysbot+tracker@w3.org>
- Date: Thu, 13 Aug 2015 02:29:14 +0000
- To: public-data-shapes-wg@w3.org
shapes-ISSUE-79 (Validation functions): Cleaner separation between value checking and property iteration [SHACL Spec] http://www.w3.org/2014/data-shapes/track/issues/79 Raised by: Holger Knublauch On product: SHACL Spec I was never quite happy with one aspect of how SHACL templates (including the Core templates) were internally defined. To illustrate the problem, this is how sh:datatype is currently defined: sh:AbstractDatatypePropertyConstraint a sh:ConstraintTemplate ; rdfs:subClassOf sh:AbstractPropertyConstraint ; ... sh:message "Values must have datatype {?datatype}" ; sh:sparql """ SELECT ?this (?this AS ?subject) ?predicate ?object ?datatype WHERE { ?this ?predicate ?object . FILTER (!sh:hasDatatype(?object, ?datatype)) . } """ ; . The SPARQL query above does two things: a) It iterates over all property values b) It FILTERs each of these values. This is a recurring pattern for many property constraints. I would like us to make this design pattern more explicit so that it becomes the following: sh:AbstractDatatypePropertyConstraint a sh:PropertyValueConstraintTemplate ; rdfs:subClassOf sh:AbstractPropertyConstraint ; ... sh:message "Values must have datatype {?datatype}" ; sh:validationFunction sh:hasDatatype ; . Instead of pointing at a SPARQL query that does everything, such constraints only point at a Function, which must take a value as input and return a boolean. The engine can produce the surrounding SPARQL automatically, and can even directly inject the body of the sh:hasDatatype function. This design has the following advantages: - It makes the contract more explicit, modular and arguably cleaner - It allows to focus on what really matters, reducing clutter - It makes it easier to reuse the logic, especially for inverse property constraints (and sh:Arguments) - It makes it easier to optimize execution - if only a boolean result is needed, then a code generator can more easily combine them into a single FILTER such as sh:hasDatatype(...) && sh:hasNodeKind(...) - It lowers the implementation costs for other languages like JavaScript - these can focus on implementing the functions - It raises the abstraction, e.g. in JavaScript these checks can be simple variable comparisons, regardless of how the surrounding iteration happened - The key snippets are also reusable inside of SPARQL expressions, because they are also SPARQL functions. The only disadvantage that I can think of is that there is a little bit more work for the engine implementers, because this requires a couple of new classes and properties to work correctly. However, I'd rather push the complexity to the engine developers and have a cleaner overall design for the end users. And, nobody is forced to use this new pattern - anyone can still use the current mechanism using sh:sparql. I have implemented this on a test branch, and a new Turtle file can be found here: https://github.com/w3c/data-shapes/blob/ISSUE-79/shacl/shacl.shacl.ttl I have not yet updated the textual documents - I'd love to hear from others on the general direction before I spend more time on this.
Received on Thursday, 13 August 2015 02:29:16 UTC