Selected problems with Proposal 4 from Holger Knublauch on 2016-03-10 (public-data-shapes-wg@w3.org from March 2016)

From: Holger Knublauch <holger@topquadrant.com>
Date: Thu, 10 Mar 2016 21:10:48 +1000
To: "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
Message-ID: <56E15638.2020104@topquadrant.com>
I took a reasonably in-depth look at

https://www.w3.org/2014/data-shapes/wiki/ISSUE-95:_Metamodel_simplifications#Proposal_4

and below is my feedback.

Summary: I don't regard anything in this proposal as an improvement over 
proposal 3. IMHO it presents a massive step backwards for both users of 
the core language and the advanced features. If there are ideas worth 
harvesting then these should be raised and examined individually. I 
support re-opening ISSUE-41 as suggested by Simon for the paths topic, 
and to generalize sh:and/or/not so that they can directly point at 
sh:Constraints instead of just shapes.

HTH
Holger


General Problems

1) Proposal 4 is poorly motivated. As Peter stated himself, he started 
this effort to simplify the metamodel. He made changes to the end-user 
visible syntax in order to "simplify" the metamodel. However, there was 
no problem with the end-user visible syntax to begin with. There was no 
need to change it, and the new syntax is a step backwards. The metamodel 
is far less important than the user-facing syntax.

2) The syntax changes seem to reflect Peter's world view that SHACL 
should only be a constraint checking language, not used to describe data 
or even as "a modeling language". The syntax changes have made the model 
less predictable, and harder to use by algorithms such as form builders, 
without adding expressivity for constraint checking.

3) There is no experience with this syntax. We need to redo all 
evaluation, repeat experiments, even revisit every single already closed 
ISSUE whether it is still valid under the new approach. External 
observers of SHACL will be upset that we made such changes so relatively 
late in the process. Such a drastic change will set us back by months. 
We'll likely need another face to face meeting. The arguments to justify 
all this are extremely weak. Meanwhile we will be losing a lot of time 
just debating something that I consider a non-starter. It would be much 
more productive to look at some key aspects of where Peter believes we 
could do better and work on incremental improvements, i.e. harvest some 
ideas that we agree on, instead of creating a completely new language.


On merging Shapes and Constraints

4) There is nothing conceptually difficult about the current metamodel, 
and there was no need to change it. Shapes are a collection of 
constraints and define a scope. Constraints restrict the focus node, 
possibly following properties. That's basically it. Shapes are similar 
to class definitions and intuitive to understand for most people. 
Merging these concepts blurs the lines, for no convincing reason. I 
expect that future use cases of Shapes will involve rules via a property 
such as shr:rule. Shapes serve as an entity to group focus nodes, and 
this role is independent of constraints.

5) If Shapes are constraints then we are just repeating the same mistake 
with making sh:closed an attribute of the shape: We lose the ability to 
specify severity and other things. Basically, it has become impossible 
(or arcane) to specify different (node) constraints with different 
severity. For this, constraints need to be objects attached to the 
shape. Alternatively you'd need shapes pointing at sub-shapes, but then 
you end up with different syntaxes for the same thing.

6) If the main motivation for linking shapes and constraints was 
syntactic sugar, then we could make plenty of other incremental changes, 
such as allowing the values of sh:and/sh:or to be sh:NodeConstraints, 
not just Shapes, or generalize sh:valueShape into sh:valueConstraint, 
pointing at constraints directly.


On property/inverseProperty vs generalized paths

7) Paths can already be handled (in a very controlled form) using 
sh:valueShape and derived values.

8) The syntax for inverse properties becomes very ugly and inconsistent 
with how forward properties are represented:

ex:MyShape
     sh:fillers ( [ sh:inverse ex:parent ] [ sh:minCount 1 ] ) ;
     sh:fillers ( ex:parent [ sh:minCount 1 ] ) .

9) Path expressions cause a lot of new complexity, computationally, 
syntactically, for SPARQL generation etc.

10) Path expressions make static analysis (for things like form 
generation and structural checking of a shapes model) almost impossible. 
If an arbitrary path can show up where we previously only had simple 
predicates, then a lot of extra checking and branching needs to happen 
to make sense of the situation.

11) It is incorrect to claim that all constraint types can be used in 
combination with every path. For example, sh:minInclusive does not apply 
to inverse properties. The current metamodel and proposal 3 can express 
this using standard techniques (classes such as 
sh:InversePropertyConstraint), but Proposal 4 throws everything together 
and this ability is lost. As a result, tools cannot provide guidance 
about which values can actually be entered when.

12) Some constraint types require different SPARQL queries (or 
JavaScript or whatever) depending on the direction of a property (or 
even worse, for an arbitrary path). For example sh:minCount needs to 
count subjects versus objects. Proposal 4 does not even talk about this 
and no example of SPARQL generation is given. Not all constraint types 
are of the simple allValuesFrom pattern implemented by 
NodeValidationFunctions.

13) In cases like sh:fillers ( ex:property [ sh:minCount 1 ] ) the 
"shape" with the minCount is no longer working stand-alone, but it 
requires knowledge about its context (e.g. the specific path that was 
used) to work correctly. This is unclear and adds unnecessary 
complexity. It is an unnecessary construct to have objects that change 
their meaning depending of their parent resource.


On the constraint types limited to a single property only

14) This is a particularly poorly motivated change that goes backwards: 
in order to accommodate a "simplification" of the metamodel, the syntax 
was changed and an unfounded claim is used that "multiple parameters are 
a poor syntax". The example in ISSUE-133 is skewed to give the 
impression that a real problem exists:

[ a sh:Propertyonstraint ;
     sh:pattern "http:*" ;
     sh:predicate ex:httpURL ;
     sh:datatype xs:string ;
     sh:minCount 1 ;
     sh:maxCount 1 ;
     sh:flags "i" ]

If your concern is readability of the source code, why would anybody put 
sh:pattern and sh:flags so far apart? This is ridiculous. Just write

[ a sh:Propertyonstraint ;
     sh:pattern "http:*" ;  sh:flags "i" ;
     sh:predicate ex:httpURL ;
     sh:datatype xs:string ;
     sh:minCount 1 ;
     sh:maxCount 1 ]

and problem solved. If you are not editing the Turtle, then of course it 
is a matter of tool support, and any reasonable tool will of course 
group those parameters visually together. We even have sh:group and 
sh:order attributes for those purposes, and the ConstraintTypes bundle 
together their parameters in Proposal 3. The same information can (and 
will) be used by editing tools that write Turtle files.

15) With single-parameter constraint types, and the need to use reified 
objects or list parameters whenever you need to pass in multiple values 
instead, the labeltemplate and sh:message templates become useless as 
there is no general mechanism to access the nested parameter values. 
They just become random objects and lists.

16) If multiple parameters are needed, the problem of defining and using 
them is just shifted by one level. For example, proposal 3 has a uniform 
and integrated syntax to define parameters. If you just point at an 
object then you need to talk (elsewhere) about the constraints on those 
objects. This is inconsistent, verbose, unmaintainable and not user 
friendly at all.

17) There is no uniform syntax for parameters anymore. Some are just 
plain values, others are lists, others are objects. Consider the case of 
sh:pattern. In Proposal 4, the values of sh:pattern are either a string 
or a list where the first value is a string and the second another 
string, with a different meaning. Imagine having to write code, editors 
or even a SPARQL query for that. You'll end up with complicating UNIONs 
and ORs everywhere just to handle the variations due to the metamodel 
"simplifications".

18) If you need parameter objects to pass in multiple logical 
parameters, then you basically *always* need access to the $shapesGraph. 
Peter was strongly against this for ages, and made a lot of noise about 
that. Now he has completely reverted his position, just to accommodate 
his "simplification", and to even make it possible at all.

19) If you need parameter objects to pass in multiple values, every 
SPARQL implementation of such a constraint type will first need to start 
with a block to retrieve all the real parameters that are nested in the 
object or list. Compare:

WHERE {
     GRAPH $shapesGraph {
         $myParam ex:value1 ?value1 .
         OPTIONAL {
             $myParam ex:value2 ?value2 .
         }
     }
     $this $predicate ?object .
     FILTER (doSomething(?object, ?value1) || (bound(?value2) && 
soSomethingElse(?object, ?value2))
}

versus the current syntax:

WHERE {
     $this $predicate ?object .
     FILTER (doSomething(?object, $value1) || (bound(?value2) && 
soSomethingElse(?object, $value2))
}

20) Related to point 19) above, you will have a combinatorial explosion 
of parameters if you have multiple OPTIONAL blocks. This will sometimes 
require nested SELECT DISTINCTs etc.

21) Proposal 4 separates the "shape" of a constraint type from its 
actual definition. This is verbose and harder to maintain. Proposal 3 
handles this much more elegantly, where the constraint type itself 
doubles as a shape, and sh:parameter is basically a property constraint 
(pending the choice of various options). No need for separate shapes.

22) sh:ComponentTemplate in Proposal 4 mixes rdf:Property and sh:Shape. 
One of the main points of criticism from Arthur (and others I believe) 
was that my proposal used metaclasses. Here something very similar 
happens again.

23) Show stopper: Proposal 4 also limits Functions to just a single 
parameter, and claims that parameter objects can be passed into the 
function instead. This is not working, because it is not practically 
possible to manipulate the shapes graph prior to every function 
invocation. For example ex:myFunction(2, 3) would become 
ex:myFunction(ex:args) where [ ex:args sh:arg1 2 ; sh:arg2 3 ]. This 
cannot work for cases such as ex:myFunction(2, ?value). Fixing this 
would cause an inconsistency in the way that functions vs other 
parameterizables are defined. Proposal 3 handles all these consistently.


Miscellaneous

24) The new syntax is not more user friendly at all, e.g. the proximity 
of sh:fillers vs sh:filter. What is a "filler" anyway? The existing 
syntax from Proposal 3 is very similar to Resource Shapes and OWL 
(restrictions), both have user experience and there was no need to 
switch to something like sh:fillers.

25) Show stopper: Using list positions to encode logic is a very bad 
anti-pattern. The syntax

     sh:fillers ( ex:myProperty [ sh:minCount 1 ] )

may superficially look more compact, but it violates any established 
design pattern in either RDF or object-orientation. If something is a 
"path", then call it "path" in the data model. If something is a shape 
then call it such, even if the Turtle becomes a bit longer:

     sh:fillers [ sh:path ex:myProperty ; sh:shape [ sh:minCount 1 ] ) .

Just for the sake of it, following this "design pattern" someone could 
model a Person record as an rdf:List:

     (   "John"
         "Doe"
         "1971-07-07"^^xsd:date
         ex:USA )

Following your approach, if someone has multiple first names, make a 
nested list

     ( ("John" "Edward" )
         "Doe"
         "1971-07-07"^^xsd:date
         ex:USA )

The "beauty" of your syntax fades quickly if you ever use this in other 
formats such as JSON-LD:

     [ [ "John", "Edward" ],
         "Doe",
         { "@value" : "1971-07-07", "@type" : 
"http://www.w3.org/2001/XMLSchema#date" },
         { "@id" : "ex:USA" } ]

The problem here is that lists don't allow you to create @contexts. A 
better JSON-LD syntax, using normal named properties instead of lists 
would be:

     { "firstNames": [ "John", "Edward" ],
         "lastName" : "Doe",
         "dob" : "1971-07-07",
         "country": "ex:USA" ]

So, creating an RDF vocabulary just so that it looks good in Turtle is a 
very bad idea. While the Person example above is for illustration 
purposes, the same issue happens for every sh:filler scenario and will 
happen with custom extensions too.

Needless to say, such rdf:Lists are almost impossible to use in SPARQL 
or any query-based approach.

26) The claim that a simple sh:sparqlTemplate per componentTemplate is 
sufficient is incorrect, because some templates need to operate on the 
results of path expressions (e.g. sh:class) while others need to look at 
the full focus node + path combination. There is no vocabulary to encode 
these differences that could be used by an implementation. It would 
require a novel text-insertion mechanism for things like "insert path here".

27) The SPARQL behind these templates cannot be reused in other SPARQL 
queries, unlike sh:NodeValidationFunctions.
Received on Thursday, 10 March 2016 11:11:24 UTC