Re: Selected problems with Proposal 4 from Peter F. Patel-Schneider on 2016-03-11 (public-data-shapes-wg@w3.org from March 2016)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Fri, 11 Mar 2016 09:41:01 -0800
To: Irene Polikoff <irene@topquadrant.com>, Holger Knublauch <holger@topquadrant.com>, "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
Message-ID: <56E3032D.5050502@gmail.com>
At some point there will have to be a reference implementation, I agree.

peter


On 03/10/2016 04:37 PM, Irene Polikoff wrote:
> I agree that many of these are implementation issues, but then having the
> implementation is very important - it shows that the proposal is indeed
> viable, otherwise, it is all a bit hypothetical and here-say. Invariably,
> implementation work uncovers issues (some smaller, some larger) that often
> lead to the revisions of the proposal. Such incremental revisions tend to
> add complexity and what looked clean and streamlined in the beginning
> often starts to become considerably more convoluted.
> 
> Peter, are you planning to create a reference implementation for this to
> actually prove the viability of your proposal?
> 
> Irene 
> 
> 
> On 3/10/16, 6:54 PM, "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
> wrote:
> 
>> Here are responses to some of the points that Holder makes.
>>
>> peter
>>
>>
>> 2/ The current SHACL syntax does not nicely handle some common examples.
>>
>> Consider a shape limiting a person's guru to be both a person and a
>> preacher.  The Simplest current Way of doing this is something like
>>  ex:foo a sh:Shape ;
>>   sh:property [ sh:predicate ex:guru ;
>>             sh:class ex:Person ] ;
>>   sh:property [ sh:predicate ex:guru ;
>>             sh:class ex:Preacher ] .
>> In my proposal this would be
>>  ex:foo a sh:Shape ;
>>   sh:property ( ex:guru [ sh:class ex:Person; sh:class ex:Preacher ] ) .
>> The current syntax results in shapes that are harder to analyze by tools.
>>
>> Consider a shape limiting the form of a SSN.  Right now this requires
>> something like
>>  ex:foo a sh:Shape ;
>>   sh:property [ sh:predicate ex:guru ;
>>             sh:pattern "[0-9]*" ] .
>> My proposal is very similar
>>  ex:foo a sh:Shape ;
>>   sh:property ( ex:guru [ sh:pattern "[0-9]*" ] ) .
>> However, to figure out what is going on in the current syntax requires
>> looking for the flags property, also not so simple for tools.
>>
>> 5/ Merging constraints and shapes does not limit the places where severity
>> and other information can be attached.
>>
>> 9,10/ I agree that paths add a lot of complication both for implementing
>> constraints and for other toos.  I added them to see how complex they
>> would
>> be.  The proposal does not depend on paths.  I will indicate where the
>> changes would be.
>>
>> 11/ Even though RDF requires that subjects of triples are not literals,
>> there is no reason to forbid literal-only constructs in places where
>> literals can not appear.  For conforming RDF graphs these will always be
>> false but for extended RDF graphs they will do the right thing.  SPARQL
>> itself permits literals as the subject of triple patterns so that it will
>> work well with extended RDF graphs.
>>
>> 12/ If a construct like sh:minCount needs to know whether it is in the
>> object of a sh:property or an sh:inverseProperty, then that is problematic
>> in the current situation.  How is it to know?
>>
>> 13/ In sh:fillers ( ex:property [ sh:minCount 1 ] ), the sh:minCount does
>> work alone, without knowledge about its context.  All that it is saying
>> that
>> there is at least one of whatever.  If an implementation needs to know
>> about
>> the context then that is something to be fixed or worked around.
>>
>> 17/ Optional parameters complicate matters no matter how they are set up.
>> Putting the parameters together requires unpacking them.  Having two
>> properties requires finding them both.
>>
>> 19,20/ These appear to be implementation issues.
>>
>> 22/ I believe that Arthur was against using metaclasses in the metamodel,
>> i.e., that a template was a metaclass and a constraint was a class.  In
>> this
>> proposal templates are classes, not a metaclass.  Templates are also
>> properties, so the IRI of the template can be directly used in shapes.
>> This
>> has the added benefit of tieing the template property to the template.  In
>> the current setup instead separate bits of the template are required to
>> state which properties carry its meaning.  This has a problem if two
>> templates use the same properties.  Which template is to be used then?
>>
>> 23/ I do not limit Functions to a single argument.  The information passed
>> in is the list of the arguments, which can be split up in the SPARQL code.
>>
>> 25/ I am not using list positions to "encode logic".  I am using list
>> positions for syntax, so as to make the syntax more compact.  Even most
>> logics uses list positions in their syntax.  If compactness is not a
>> desirable feature then changing to a object-like syntax is simple.
>>
>> 26/ Conceptually an expression like sh:minCount does not need to work on a
>> focus node + path combination.  All that it really needs to know is the
>> fillers of the path so that it can count them.
>>
>>
>> On 03/10/2016 03:10 AM, Holger Knublauch wrote:
>>> I took a reasonably in-depth look at
>>>
>>>
>>> https://www.w3.org/2014/data-shapes/wiki/ISSUE-95:_Metamodel_simplificati
>>> ons#Proposal_4
>>>
>>>
>>> and below is my feedback.
>>>
>>> Summary: I don't regard anything in this proposal as an improvement over
>>> proposal 3. IMHO it presents a massive step backwards for both users of
>>> the
>>> core language and the advanced features. If there are ideas worth
>>> harvesting
>>> then these should be raised and examined individually. I support
>>> re-opening
>>> ISSUE-41 as suggested by Simon for the paths topic, and to generalize
>>> sh:and/or/not so that they can directly point at sh:Constraints instead
>>> of
>>> just shapes.
>>>
>>> HTH
>>> Holger
>>>
>>>
>>> General Problems
>>>
>>> 1) Proposal 4 is poorly motivated. As Peter stated himself, he started
>>> this
>>> effort to simplify the metamodel. He made changes to the end-user
>>> visible
>>> syntax in order to "simplify" the metamodel. However, there was no
>>> problem
>>> with the end-user visible syntax to begin with. There was no need to
>>> change
>>> it, and the new syntax is a step backwards. The metamodel is far less
>>> important than the user-facing syntax.
>>>
>>> 2) The syntax changes seem to reflect Peter's world view that SHACL
>>> should
>>> only be a constraint checking language, not used to describe data or
>>> even as
>>> "a modeling language". The syntax changes have made the model less
>>> predictable, and harder to use by algorithms such as form builders,
>>> without
>>> adding expressivity for constraint checking.
>>>
>>> 3) There is no experience with this syntax. We need to redo all
>>> evaluation,
>>> repeat experiments, even revisit every single already closed ISSUE
>>> whether it
>>> is still valid under the new approach. External observers of SHACL will
>>> be
>>> upset that we made such changes so relatively late in the process. Such
>>> a
>>> drastic change will set us back by months. We'll likely need another
>>> face to
>>> face meeting. The arguments to justify all this are extremely weak.
>>> Meanwhile
>>> we will be losing a lot of time just debating something that I consider
>>> a
>>> non-starter. It would be much more productive to look at some key
>>> aspects of
>>> where Peter believes we could do better and work on incremental
>>> improvements,
>>> i.e. harvest some ideas that we agree on, instead of creating a
>>> completely new
>>> language.
>>>
>>>
>>> On merging Shapes and Constraints
>>>
>>> 4) There is nothing conceptually difficult about the current metamodel,
>>> and
>>> there was no need to change it. Shapes are a collection of constraints
>>> and
>>> define a scope. Constraints restrict the focus node, possibly following
>>> properties. That's basically it. Shapes are similar to class
>>> definitions and
>>> intuitive to understand for most people. Merging these concepts blurs
>>> the
>>> lines, for no convincing reason. I expect that future use cases of
>>> Shapes will
>>> involve rules via a property such as shr:rule. Shapes serve as an
>>> entity to
>>> group focus nodes, and this role is independent of constraints.
>>>
>>> 5) If Shapes are constraints then we are just repeating the same
>>> mistake with
>>> making sh:closed an attribute of the shape: We lose the ability to
>>> specify
>>> severity and other things. Basically, it has become impossible (or
>>> arcane) to
>>> specify different (node) constraints with different severity. For this,
>>> constraints need to be objects attached to the shape. Alternatively
>>> you'd need
>>> shapes pointing at sub-shapes, but then you end up with different
>>> syntaxes for
>>> the same thing.
>>>
>>> 6) If the main motivation for linking shapes and constraints was
>>> syntactic
>>> sugar, then we could make plenty of other incremental changes, such as
>>> allowing the values of sh:and/sh:or to be sh:NodeConstraints, not just
>>> Shapes,
>>> or generalize sh:valueShape into sh:valueConstraint, pointing at
>>> constraints
>>> directly.
>>>
>>>
>>> On property/inverseProperty vs generalized paths
>>>
>>> 7) Paths can already be handled (in a very controlled form) using
>>> sh:valueShape and derived values.
>>>
>>> 8) The syntax for inverse properties becomes very ugly and inconsistent
>>> with
>>> how forward properties are represented:
>>>
>>> ex:MyShape
>>>     sh:fillers ( [ sh:inverse ex:parent ] [ sh:minCount 1 ] ) ;
>>>     sh:fillers ( ex:parent [ sh:minCount 1 ] ) .
>>>
>>> 9) Path expressions cause a lot of new complexity, computationally,
>>> syntactically, for SPARQL generation etc.
>>>
>>> 10) Path expressions make static analysis (for things like form
>>> generation and
>>> structural checking of a shapes model) almost impossible. If an
>>> arbitrary path
>>> can show up where we previously only had simple predicates, then a lot
>>> of
>>> extra checking and branching needs to happen to make sense of the
>>> situation.
>>>
>>> 11) It is incorrect to claim that all constraint types can be used in
>>> combination with every path. For example, sh:minInclusive does not
>>> apply to
>>> inverse properties. The current metamodel and proposal 3 can express
>>> this
>>> using standard techniques (classes such as
>>> sh:InversePropertyConstraint), but
>>> Proposal 4 throws everything together and this ability is lost. As a
>>> result,
>>> tools cannot provide guidance about which values can actually be
>>> entered when.
>>>
>>> 12) Some constraint types require different SPARQL queries (or
>>> JavaScript or
>>> whatever) depending on the direction of a property (or even worse, for
>>> an
>>> arbitrary path). For example sh:minCount needs to count subjects versus
>>> objects. Proposal 4 does not even talk about this and no example of
>>> SPARQL
>>> generation is given. Not all constraint types are of the simple
>>> allValuesFrom
>>> pattern implemented by NodeValidationFunctions.
>>>
>>> 13) In cases like sh:fillers ( ex:property [ sh:minCount 1 ] ) the
>>> "shape"
>>> with the minCount is no longer working stand-alone, but it requires
>>> knowledge
>>> about its context (e.g. the specific path that was used) to work
>>> correctly.
>>> This is unclear and adds unnecessary complexity. It is an unnecessary
>>> construct to have objects that change their meaning depending of their
>>> parent
>>> resource.
>>>
>>>
>>> On the constraint types limited to a single property only
>>>
>>> 14) This is a particularly poorly motivated change that goes backwards:
>>> in
>>> order to accommodate a "simplification" of the metamodel, the syntax was
>>> changed and an unfounded claim is used that "multiple parameters are a
>>> poor
>>> syntax". The example in ISSUE-133 is skewed to give the impression that
>>> a real
>>> problem exists:
>>>
>>> [ a sh:Propertyonstraint ;
>>>     sh:pattern "http:*" ;
>>>     sh:predicate ex:httpURL ;
>>>     sh:datatype xs:string ;
>>>     sh:minCount 1 ;
>>>     sh:maxCount 1 ;
>>>     sh:flags "i" ]
>>>
>>> If your concern is readability of the source code, why would anybody put
>>> sh:pattern and sh:flags so far apart? This is ridiculous. Just write
>>>
>>> [ a sh:Propertyonstraint ;
>>>     sh:pattern "http:*" ;  sh:flags "i" ;
>>>     sh:predicate ex:httpURL ;
>>>     sh:datatype xs:string ;
>>>     sh:minCount 1 ;
>>>     sh:maxCount 1 ]
>>>
>>> and problem solved. If you are not editing the Turtle, then of course
>>> it is a
>>> matter of tool support, and any reasonable tool will of course group
>>> those
>>> parameters visually together. We even have sh:group and sh:order
>>> attributes
>>> for those purposes, and the ConstraintTypes bundle together their
>>> parameters
>>> in Proposal 3. The same information can (and will) be used by editing
>>> tools
>>> that write Turtle files.
>>>
>>> 15) With single-parameter constraint types, and the need to use reified
>>> objects or list parameters whenever you need to pass in multiple values
>>> instead, the labeltemplate and sh:message templates become useless as
>>> there is
>>> no general mechanism to access the nested parameter values. They just
>>> become
>>> random objects and lists.
>>>
>>> 16) If multiple parameters are needed, the problem of defining and
>>> using them
>>> is just shifted by one level. For example, proposal 3 has a uniform and
>>> integrated syntax to define parameters. If you just point at an object
>>> then
>>> you need to talk (elsewhere) about the constraints on those objects.
>>> This is
>>> inconsistent, verbose, unmaintainable and not user friendly at all.
>>>
>>> 17) There is no uniform syntax for parameters anymore. Some are just
>>> plain
>>> values, others are lists, others are objects. Consider the case of
>>> sh:pattern.
>>> In Proposal 4, the values of sh:pattern are either a string or a list
>>> where
>>> the first value is a string and the second another string, with a
>>> different
>>> meaning. Imagine having to write code, editors or even a SPARQL query
>>> for
>>> that. You'll end up with complicating UNIONs and ORs everywhere just to
>>> handle
>>> the variations due to the metamodel "simplifications".
>>>
>>> 18) If you need parameter objects to pass in multiple logical
>>> parameters, then
>>> you basically *always* need access to the $shapesGraph. Peter was
>>> strongly
>>> against this for ages, and made a lot of noise about that. Now he has
>>> completely reverted his position, just to accommodate his
>>> "simplification",
>>> and to even make it possible at all.
>>>
>>> 19) If you need parameter objects to pass in multiple values, every
>>> SPARQL
>>> implementation of such a constraint type will first need to start with
>>> a block
>>> to retrieve all the real parameters that are nested in the object or
>>> list.
>>> Compare:
>>>
>>> WHERE {
>>>     GRAPH $shapesGraph {
>>>         $myParam ex:value1 ?value1 .
>>>         OPTIONAL {
>>>             $myParam ex:value2 ?value2 .
>>>         }
>>>     }
>>>     $this $predicate ?object .
>>>     FILTER (doSomething(?object, ?value1) || (bound(?value2) &&
>>> soSomethingElse(?object, ?value2))
>>> }
>>>
>>> versus the current syntax:
>>>
>>> WHERE {
>>>     $this $predicate ?object .
>>>     FILTER (doSomething(?object, $value1) || (bound(?value2) &&
>>> soSomethingElse(?object, $value2))
>>> }
>>>
>>> 20) Related to point 19) above, you will have a combinatorial explosion
>>> of
>>> parameters if you have multiple OPTIONAL blocks. This will sometimes
>>> require
>>> nested SELECT DISTINCTs etc.
>>>
>>> 21) Proposal 4 separates the "shape" of a constraint type from its
>>> actual
>>> definition. This is verbose and harder to maintain. Proposal 3 handles
>>> this
>>> much more elegantly, where the constraint type itself doubles as a
>>> shape, and
>>> sh:parameter is basically a property constraint (pending the choice of
>>> various
>>> options). No need for separate shapes.
>>>
>>> 22) sh:ComponentTemplate in Proposal 4 mixes rdf:Property and sh:Shape.
>>> One of
>>> the main points of criticism from Arthur (and others I believe) was
>>> that my
>>> proposal used metaclasses. Here something very similar happens again.
>>>
>>> 23) Show stopper: Proposal 4 also limits Functions to just a single
>>> parameter,
>>> and claims that parameter objects can be passed into the function
>>> instead.
>>> This is not working, because it is not practically possible to
>>> manipulate the
>>> shapes graph prior to every function invocation. For example
>>> ex:myFunction(2,
>>> 3) would become ex:myFunction(ex:args) where [ ex:args sh:arg1 2 ;
>>> sh:arg2 3
>>> ]. This cannot work for cases such as ex:myFunction(2, ?value). Fixing
>>> this
>>> would cause an inconsistency in the way that functions vs other
>>> parameterizables are defined. Proposal 3 handles all these consistently.
>>>
>>>
>>> Miscellaneous
>>>
>>> 24) The new syntax is not more user friendly at all, e.g. the proximity
>>> of
>>> sh:fillers vs sh:filter. What is a "filler" anyway? The existing syntax
>>> from
>>> Proposal 3 is very similar to Resource Shapes and OWL (restrictions),
>>> both
>>> have user experience and there was no need to switch to something like
>>> sh:fillers.
>>>
>>> 25) Show stopper: Using list positions to encode logic is a very bad
>>> anti-pattern. The syntax
>>>
>>>     sh:fillers ( ex:myProperty [ sh:minCount 1 ] )
>>>
>>> may superficially look more compact, but it violates any established
>>> design
>>> pattern in either RDF or object-orientation. If something is a "path",
>>> then
>>> call it "path" in the data model. If something is a shape then call it
>>> such,
>>> even if the Turtle becomes a bit longer:
>>>
>>>     sh:fillers [ sh:path ex:myProperty ; sh:shape [ sh:minCount 1 ] ) .
>>>
>>> Just for the sake of it, following this "design pattern" someone could
>>> model a
>>> Person record as an rdf:List:
>>>
>>>     (   "John"
>>>         "Doe"
>>>         "1971-07-07"^^xsd:date
>>>         ex:USA )
>>>
>>> Following your approach, if someone has multiple first names, make a
>>> nested list
>>>
>>>     ( ("John" "Edward" )
>>>         "Doe"
>>>         "1971-07-07"^^xsd:date
>>>         ex:USA )
>>>
>>> The "beauty" of your syntax fades quickly if you ever use this in other
>>> formats such as JSON-LD:
>>>
>>>     [ [ "John", "Edward" ],
>>>         "Doe",
>>>         { "@value" : "1971-07-07", "@type" :
>>> "http://www.w3.org/2001/XMLSchema#date" },
>>>         { "@id" : "ex:USA" } ]
>>>
>>> The problem here is that lists don't allow you to create @contexts. A
>>> better
>>> JSON-LD syntax, using normal named properties instead of lists would be:
>>>
>>>     { "firstNames": [ "John", "Edward" ],
>>>         "lastName" : "Doe",
>>>         "dob" : "1971-07-07",
>>>         "country": "ex:USA" ]
>>>
>>> So, creating an RDF vocabulary just so that it looks good in Turtle is
>>> a very
>>> bad idea. While the Person example above is for illustration purposes,
>>> the
>>> same issue happens for every sh:filler scenario and will happen with
>>> custom
>>> extensions too.
>>>
>>> Needless to say, such rdf:Lists are almost impossible to use in SPARQL
>>> or any
>>> query-based approach.
>>>
>>> 26) The claim that a simple sh:sparqlTemplate per componentTemplate is
>>> sufficient is incorrect, because some templates need to operate on the
>>> results
>>> of path expressions (e.g. sh:class) while others need to look at the
>>> full
>>> focus node + path combination. There is no vocabulary to encode these
>>> differences that could be used by an implementation. It would require a
>>> novel
>>> text-insertion mechanism for things like "insert path here".
>>>
>>> 27) The SPARQL behind these templates cannot be reused in other SPARQL
>>> queries, unlike sh:NodeValidationFunctions.
>>>
>>>
>>
> 
>
Received on Friday, 11 March 2016 17:41:35 UTC