- From: Holger Knublauch <holger@topquadrant.com>
- Date: Sat, 4 Jun 2016 10:14:48 +1000
- To: public-data-shapes-wg@w3.org
- Message-ID: <d3e922d4-4c4d-db43-08da-950928b5ea3b@topquadrant.com>
Hi Peter, thanks for the discussion - this is an important topic and worth drilling into. On 4/06/2016 0:12, Peter F. Patel-Schneider wrote: > My original message in this thread is mostly concerned with how to implement > constraint components universally. There is also a short preamble on how > one can best describe how constraint components. I'm going to defend both > of these points separately, but I'm going to start with the implementation > point as that is the bulk of my original message. > > > Right now constraint components have up to three different implementations - > one when they occur in a property constraint, one when they occur in an > inverse property constraint, and one when they occur in a node constraint. > This means that there are up to three different pieces of code for each > constraint component, each (hopefully) implementing the same functionality. > I view this as a poor setup - three different pieces of code that have to be > written and thus three places where the bugs can be introduced. I fully agree. > > Having a single implementation of each constraint component would actually > reduce development costs. Ideally, this single implementation would be as > simple as the ask validators that implement many constraint components. > Consider, for example, sh:minCount whose implementation should be very > little more than "HAVING ( COUNT (DISTINCT ?value) < ?minCount )". Yes, if this were possible then this would be ideal. > However, > I can't figure out how to do this nicely because of limitations in SPARQL, > hence the solution with boilerplate. Exactly that's the same conclusion that I also have made. Furthermore I remember long discussions with Arthur on the phone in November. He had also questioned why we cannot combine all these cases. But he also did not come up with a better solution. If all three of us don't come up with a solution then maybe there is none. > However, even the boilerplate solution > has only one implementation of each constraint component, and here one is > definitely better than three and also better than two. The boilerplate solution that you have described is already covered in the spec. Look at 6.5.2 on ASK validators, which enumerate these boilerplate snippets as "templates": http://w3c.github.io/data-shapes/shacl/#SPARQLAskValidator Users already have the choice to just specify a single ASK query to cover all three cases. In my current implementation this technique is used for a large number of constraint components. To reproduce, open the attached copy of dash.ttl and run this query: SELECT * WHERE { ?cc a sh:ConstraintComponent . OPTIONAL { ?cc sh:nodeValidator ?nodeValidator } OPTIONAL { ?cc sh:propertyValidator ?propValidator } OPTIONAL { ?cc sh:inversePropertyValidator ?invValidator } } Results: 14 constraint components currently use ASK queries: sh:ClassConstraintComponent dash:hasClass dash:hasClass dash:hasClass sh:ClassInConstraintComponent dash:hasClassIn dash:hasClassIn dash:hasClassIn sh:DatatypeConstraintComponent dash:hasDatatype dash:hasDatatype sh:DatatypeInConstraintComponent dash:hasDatatypeIn dash:hasDatatypeIn sh:InConstraintComponent dash:isIn dash:isIn dash:isIn sh:MaxExclusiveConstraintComponent dash:hasMaxExclusive dash:hasMaxExclusive sh:MaxInclusiveConstraintComponent dash:hasMaxInclusive dash:hasMaxInclusive sh:MaxLengthConstraintComponent dash:hasMaxLength dash:hasMaxLength dash:hasMaxLength sh:MinExclusiveConstraintComponent dash:hasMinExclusive dash:hasMinExclusive sh:MinInclusiveConstraintComponent dash:hasMinInclusive dash:hasMinInclusive sh:MinLengthConstraintComponent dash:hasMinLength dash:hasMinLength dash:hasMinLength sh:NodeKindConstraintComponent dash:hasNodeKind dash:hasNodeKind dash:hasNodeKind sh:PatternConstraintComponent dash:hasPattern dash:hasPattern dash:hasPattern sh:StemConstraintComponent dash:hasStem dash:hasStem dash:hasStem The remaining 15 ones are heterogeneous and do not easily fit into that scheme: - sh:DisjointConstraintComponent - sh:LessThanConstraintComponent - sh:LessThanOrEqualsConstraintComponent These look like they could be turned into ASK queries, so please consider them in the category above. - sh:AndConstraintComponent - sh:NotConstraintComponent - sh:OrConstraintComponent - sh:ShapeConstraintComponent These have SELECT queries because the ASK schema (currently) does not support handling of the ?failure variable. I cannot tell yet how common the ?failure handling will be and whether we need to come up with a different syntax for them. The problem is that ASK can only return true or false, but not return a thirs value, and there is no "exception" reporting in SPARQL. - sh:ClosedConstraintComponent This uses a SELECT query because the result variable ?predicate is different each time and needs to be computed as part of the WHERE clause. - sh:EqualsConstraintComponent This is a SELECT query because it requires two branches in a UNION. The boilerplate would not work IMHO. - sh:HasValueConstraintComponent This does not fit into the ASK schema. While theoretically it would be possible to use ASK { FILTER sameTerm(?value, $hasValue) } for node constraints, the performance of this would be prohibitively slow for the predicate-based constraints. The query in those cases looks very different: SELECT $this ($this AS ?object) $predicate WHERE { FILTER NOT EXISTS { $hasValue $predicate $this } } Furthermore, this is an existential FILTER that does not follow the "usual" pattern. - sh:MaxCountConstraintComponent - sh:MinCountConstraintComponent - sh:QualifiedMaxCountConstraintComponent - sh:QualifiedMinCountConstraintComponent Use yet another pattern, where there is either a HAVING clause with an aggregation, or a nested query. - sh:UniqueLangConstraintComponent This is a SELECT query because the ?lang is also being returned so that it can be used in the sh:message. Also, there should only be one validation result per ?lang, and thus it needs to be turned into a SELECT DISTINCT. *So among the 12 constraint components that are currently not covered by ASKs, there are already 6 different design patterns.* And we have not even started to look into extensions. Whatever further generalization we would come up with will almost certainly limit the expressivity of SHACL to only a subset of SPARQL, and this would be a show stopper. And then we have not even started to look into other extension languages like JavaScript... The current infrastructure is set up so that each case can have multiple validators, in multiple languages. A JavaScript-based implementation will likely not use SPARQL but instead have completely different code paths to walk the objects being validated. Having thought about all these topics for many months now, my conclusion is that we will continue to need the flexibility of multiple validators for the different cases. In a large number of cases a single ASK query will be sufficient for all three cases. And then in a further large number of cases, people will only need to develop one query for node constraints, and one for path-based constraints. I am convinced that this will be acceptable (assuming you agree we should support paths - your own proposal had them). > > > Describing all constraint components in a similar fashion is also desirable > to describing them differently. Right now some constraint components, e.g., > sh:class, are described using the notion of value nodes but others, e.g., > sh:minCount, are described using focus nodes and predicates even when the > effect is the same as value nodes. You keep bringing up the same first paragraphs in the spec :) These are usually just editorial left-overs from the olden days when the spec was just in one direction. These are easy to fix once spotted: https://github.com/w3c/data-shapes/commit/66525d7f3f784822806f5d74e54818206d01d6ef Can you find any more such examples? They are just editorial mistakes. > Regularizing the way that constraint > components are described would reduce the number of ways that errors can > creep into the document and also reduce the cognitive load on readers of the > document. Describing constraint components in terms of value nodes also > better shows the commonalities amongst them. > > It is currently possible to have a constraint component that works > completely differently when it is in a node constraint from when it is in a > property constraint and from when it is in an inverse property constraint. > Using the notion of value nodes produces a force against this divergence. Agreed. > > > These issues arise from having all constraint components sit inside the > three different kinds of constratins and having each constraint component > being responsible for its own determination of value nodes. There are > different approaches to SHACL that would eliminate these issues. ShEx has a > single property-crossing construct and all other constructs in triple > expressions are not concerned with properties. OWL has several > property-crossing constructs but most constructs in OWL work on individual > value nodes. My refactored SHACL syntax has a single property-crossing > construct and all constructs work on sets of value nodes. I have explained above why there are differences in the queries, and why these differences are important (e.g. in the case of sh:hasValue). While I share your desire to further generalize and clean up the language, there are limits where this becomes impractical or would otherwise limit the expressive power of what customers will want to do. And looking back at many years of working with customers and SPIN, the only thing we can predict is that we cannot predict the variety of use cases. We need to design the language to cater for this flexibility, not make premature assumptions on the limited set of examples that happen to be in the Core Vocabulary. Thanks, Holger > > peter > > > On 06/02/2016 10:13 PM, Holger Knublauch wrote: >> Could you help me understand why we should do this? All I am seeing is that >> this would add complexity to the language, add development costs for these >> additional cases, increase our burden to specify and write test cases for all >> these scenarios, for the "benefit" that people can apply entirely useless >> constructs such as minCount with node constraints or datatypes for subjects >> which can never be literals. >> >> Furthermore, deleting the concept of sh:context makes it impossible for tools >> to determine under which conditions a constraint component should be offered. >> The forms that I have implemented would display every constraint property on >> every case - node constraints, property constraints, inverse property >> constraints. This is not user friendly! >> >> Finally, every extension developer is forced to specify SPARQL queries for all >> cases, even if they make no sense (like most of the cases below). Some of the >> queries that you have written up are completely different from their other >> variations. How can you be sure that the same generalization is sensible for >> every possible future extension? >> >> As a random example consider one of the original Use cases: specifying a >> primary key. These are only ever meant to be used for properties, neither >> inverses nor node constraints nor paths. >> >> https://www.w3.org/TR/shacl-ucr/#uc25-primary-keys-with-uri-patterns >> >> I must be missing something, but this is a massive step backwards and a >> serious risk to the success of SHACL. There is nothing broken right now with >> the context mechanism. Why change it? >> >> Thanks, >> Holger >> >> >> On 3/06/2016 7:19, Peter F. Patel-Schneider wrote: >>> To think about how a constraint component works universally, it is >>> sufficient to think about value nodes, which are already defined at the >>> beginning of Section 4. >>> >>> So, sh:hasValue is then just that a value node is the given node and >>> sh:equals is just that the set of value nodes is the same as the set of >>> values for the focus node for the other property and sh:closed is just that >>> every value node has no values for disallowed properties and sh:minCount is >>> just that there are at least n value nodes. >>> >>> >>> Looking at https://github.com/TopQuadrant/shacl the changes to permit core >>> constraint components to be used universally appear to be as follows: >>> >>> 1/ Ensure that sh:context has all three relevant values for each constraint >>> component. (Of course then sh:context becomes irrelevant and can be >>> removed.) >>> >>> 2/ For the constraint component for: >>> >>> sh:closed add >>> sh:propertyValidator [ >>> rdf:type sh:SPARQLSelectValidator ; >>> sh:message "Predicate {?unallowed} is not allowed on {?subject} (closed >>> shape)" ; >>> sh:sparql """ >>> SELECT ?this (?val AS ?subject) ?unallowed ?object >>> WHERE { >>> { >>> FILTER ($closed) . >>> } >>> $this $predicate ?val . >>> ?val ?unallowed ?object . >>> FILTER (NOT EXISTS { >>> GRAPH $shapesGraph { >>> $currentShape sh:property/sh:predicate ?unallowed . >>> } >>> } && (!bound($ignoredProperties) || NOT EXISTS { >>> GRAPH $shapesGraph { >>> $ignoredProperties rdf:rest*/rdf:first ?unallowed . >>> } >>> })) >>> } >>> """ ; >>> Similar for inverse property constraint. >>> sh:closed should also be implementable using the simple form (like >>> sh:datatype and sh:minExclusive are). >>> >>> sh:datatype add dash:hasDatatype as a value for sh:inversePropertyValidator >>> sh:datatypeIn add dash:hasDatatypeIn as a value for >>> sh:inversePropertyValidator >>> >>> sh:hasValue add >>> sh:nodeValidator [ >>> rdf:type sh:SPARQLSelectValidator ; >>> sh:message "Node is not value {$hasValue}" ; >>> sh:sparql """ >>> SELECT $this >>> WHERE { >>> FILTER { NOT sameTerm($this,$hasValue) } >>> } >>> """ ; >>> ] ; >>> >>> sh:disjoint add >>> sh:inversePropertyValidator [ >>> rdf:type sh:SPARQLSelectValidator ; >>> sh:message "Inverse of property must not share any values with >>> {$disjoint}" ; >>> sh:sparql """ >>> SELECT $this ($this AS ?object) $predicate ?subject >>> WHERE { >>> ?subject $predicate $this . >>> ?subject $disjoint $this . >>> } >>> """ ; >>> ] ; >>> sh:nodeValidator [ >>> rdf:type sh:SPARQLSelectValidator ; >>> sh:message "Node must not be a value of {$disjoint}" ; >>> sh:sparql """ >>> SELECT $this >>> WHERE { >>> $this $disjoint ?this . >>> } >>> """ ; >>> ] ; >>> >>> sh:equals add >>> sh:inversePropertyValidator [ >>> rdf:type sh:SPARQLSelectValidator ; >>> sh:message "Inverse of property must have same values as {$equals}" ; >>> sh:sparql """ >>> SELECT $this ($this AS ?object) $predicate ?subject >>> WHERE { >>> { >>> ?subject $predicate $this . >>> FILTER NOT EXISTS { >>> ?subject $equals $this . >>> } >>> } >>> UNION >>> { >>> ?subject $equals $this . >>> FILTER NOT EXISTS { >>> ?subject $predicate $this . >>> } >>> } >>> } >>> """ ; >>> ] ; >>> sh:nodeValidator [ >>> rdf:type sh:SPARQLSelectValidator ; >>> sh:message "Node must be a value of {$equals}" ; >>> sh:sparql """ >>> SELECT $this >>> WHERE { >>> FILTER NOT EXISTS { $this $disjoint $this } >>> } >>> """ ; >>> ] ; >>> >>> sh:lessThan add >>> sh:InversePropertyValidator [ >>> rdf:type sh:SPARQLSelectValidator ; >>> sh:message "Inverse property value is not < value of {$lessThan}" ; >>> sh:sparql """ >>> SELECT $this ($this AS ?object) $predicate ?subject >>> WHERE { >>> ?subject $predicate $this . >>> $this $lessThan ?object2 . >>> FILTER (!(?subject < ?object2)) . >>> } >>> """ ; >>> ] ; >>> sh:nodeValidator [ >>> rdf:type sh:SPARQLSelectValidator ; >>> sh:message "Node is not < value of {$lessThan}" ; >>> sh:sparql """ >>> SELECT $this >>> WHERE { >>> $this $lessThan ?object2 . >>> FILTER (!(?this < ?object2)) . >>> } >>> """ ; >>> ] ; >>> >>> sh:lessThanOrEquals similar >>> >>> sh:minCount add >>> sh:nodeValidator [ >>> rdf:type sh:SPARQLSelectValidator ; >>> sh:message "Node is precisely one value, not {$minCount}" ; >>> sh:sparql """ >>> SELECT $this >>> WHERE { >>> FILTER ( 1 >= $minCount) . >>> } >>> """ ; >>> ] ; >>> >>> sh:maxCount similar >>> >>> sh:maxExclusive add dash:hasMaxExclusive as a value for >>> sh:inversePropertyValidator >>> >>> sh:maxInclusive add dash:hasMaxInclusive as a value for >>> sh:inversePropertyValidator >>> >>> sh:minExclusive add dash:hasMinExclusive as a value for >>> sh:inversePropertyValidator >>> >>> sh:minInclusive add dash:hasMinInclusive as a value for >>> sh:inversePropertyValidator >>> >>> sh:uniqueLang add >>> sh:inversePropertyValidator [ >>> rdf:type sh:SPARQLSelectValidator ; >>> sh:message "Language {?lang} used more than once" ; >>> sh:sparql """ >>> SELECT DISTINCT $this ($this AS ?object) $predicate ?lang >>> WHERE { >>> { >>> FILTER ($uniqueLang) . >>> } >>> ?value $predicate $this . >>> BIND (lang(?value) AS ?lang) . >>> FILTER (bound(?lang) && ?lang != \"\") . >>> FILTER EXISTS { >>> $this $predicate ?otherValue . >>> FILTER (?otherValue != ?value && ?lang = lang(?otherValue)) . >>> } >>> } >>> """ ; >>> ] ; >>> sh:nodeValidator [ >>> rdf:type sh:SPARQLSelectValidator ; >>> sh:message "A language used more than once on node" ; >>> sh:sparql """ >>> SELECT $this >>> WHERE { FILTER ( 1 = 0 ) >>> } >>> """ ; >>> ] ; >>> >>> sh:qualifiedMinCount add >>> sh:nodeValidator [ >>> rdf:type sh:SPARQLSelectValidator ; >>> sh:sparql """ >>> SELECT $this ($this AS ?subject) $predicate ?count ?failure >>> WHERE { >>> BIND (sh:hasShape(?subject, $valueShape, $shapesGraph) AS >>> ?hasShape) . >>> BIND (!bound(?hasShape) AS ?failure) . >>> FILTER IF(?failure, true, ?count > IF(?hasShape,1,0)) >>> } >>> """ ; >>> ] ; >>> >>> sh:qualifiedMaxCount similar >>> >>> >>> Note that none of these are difficult to do, particularly when looking at >>> the another validator for the same component. This should be true for any >>> constraint component that can be described as working on the value nodes. I >>> think that all constraint components should be describable this way. >>> >>> >>> peter >>> >>
Attachments
- text/plain attachment: dash.ttl
Received on Saturday, 4 June 2016 00:15:24 UTC