Categorization of constraint components from Holger Knublauch on 2016-07-07 (public-data-shapes-wg@w3.org from July 2016)

From: Holger Knublauch <holger@topquadrant.com>
Date: Fri, 8 Jul 2016 08:32:40 +1000
To: "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
Message-ID: <8932f997-a2f5-65c0-32a2-940413ea5af4@topquadrant.com>

(I had shown this in the telecon today)

Here is a possible categorization of our existing constraint components.

On Value Nodes "forEach":
- sh:and/sh:or/sh:not
- sh:class/sh:classIn
- sh:datatype/sh:datatypeIn
- sh:in
- sh:maxInclusive/sh:minExclusive etc
- sh:maxLength/sh:minLength
- sh:nodeKind
- sh:pattern
- sh:shape
- sh:stem
   -> Just a single (ASK) validator needed*

On Property Pairs:
- sh:disjoint/sh:equals
- sh:lessThan/sh:lessThanOrEquals
   -> Just a single SELECT validator needed (property constraints)

On Sets of Value Nodes:
- sh:maxCount/sh:minCount
- sh:qualifiedMaxCount/MinCount
- sh:uniqueLang
- sh:hasValue
   -> Just a single SELECT validator needed (property constraints)

Others:
- sh:closed
   -> Just a single SELECT validator needed (node constraints)

* sh:and, sh:or, sh:not, sh:shape currently require SELECT queries 
because the error handling cannot be expressed as an ASK. If that's a 
mayor issue, we could either make all ASK queries also SELECT queries or 
add some other hack to express errors. I would prefer the current 
solution because the error handling cases are rare.

As part of the spec update I have introduced a new property sh:validator 
to link a constraint component to a single (ASK) query. The SPARQL 
queries in those cases are now using ASK syntax, which I believe is more 
readable and intuitive.

Some observations:
- With the current design, there is only a single SPARQL query per 
constraint component, which is good.
- The introduction of paths has helped significantly, combining 
sh:property and sh:inverseProperty.
- The items under Property Pair and Sets do not make sense for node 
constraints, because the set there always has size one.
- For example, to express qualifiedMaxCount for node constraints, just 
use sh:shape or sh:not/sh:shape.
- Many cases (such as sh:closed in property constraints) make no sense 
at all, although they are mathematically possible.
- Having to support all components in both contexts would create more 
work for implementers, more work to test, more work to understand and 
explain.
- Having to provide just a single SPARQL query for all cases causes 
significant syntax problems - we would need to abstract the $PATH syntax 
into something that also works for node constraints, and this will 
likely reduce expressivity because some operations could not be 
expressed anymore.
- I believe the current spec combines the best flexibility with the best 
code reuse while avoiding the useless cases.

Holger

Received on Thursday, 7 July 2016 22:33:11 UTC