Re: Selected problems with Proposal 4 from Irene Polikoff on 2016-03-11 (public-data-shapes-wg@w3.org from March 2016)

From: Irene Polikoff <irene@topquadrant.com>
Date: Fri, 11 Mar 2016 13:08:49 -0500
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, Holger Knublauch <holger@topquadrant.com>, "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
Message-ID: <D3086DBE.95AAD%irene@topquadrant.com>
My take is that contrary to being a ³modest proposal², this is actually a
very significant change requiring a redo for much of the work that went on
in the last 18 months. In fact, a completely different proposal.

As such it needs a proof of feasibility - just like other inputs to the
working group had some proofs of feasibility (some more mature than
others).

I believe the proof needs to happen and details have to be sufficiently
worked out in order to properly consider this alternative. With this, I
think if the group wants to seriously evaluate it, someone needs to signup
for implementing it sooner (like now) rather than later. I submit that
without this, there just not enough substance to the discussions.

Would this ³someone² be you Peter and how much time do you need? Once the
implementation is done, there would be enough details to consider this new
alternative. If this is the decision taken, what would the working group
do in the meantime?

I have to say that to me, it looks like a drastic step to take after 18
months of work. I think taking such step requires a very strong reason -
something like a serious, substantiated belief that the current approach
will fail. I have not heard such argument. An alternative would be to try
to incrementally improve what has already been accomplished, agreed on and
implemented by selectively identifying some good ideas as opposed to
restarting.

Irene 




On 3/11/16, 12:41 PM, "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
wrote:

>At some point there will have to be a reference implementation, I agree.
>
>peter
>
>
>On 03/10/2016 04:37 PM, Irene Polikoff wrote:
>> I agree that many of these are implementation issues, but then having
>>the
>> implementation is very important - it shows that the proposal is indeed
>> viable, otherwise, it is all a bit hypothetical and here-say.
>>Invariably,
>> implementation work uncovers issues (some smaller, some larger) that
>>often
>> lead to the revisions of the proposal. Such incremental revisions tend
>>to
>> add complexity and what looked clean and streamlined in the beginning
>> often starts to become considerably more convoluted.
>> 
>> Peter, are you planning to create a reference implementation for this to
>> actually prove the viability of your proposal?
>> 
>> Irene 
>> 
>> 
>> On 3/10/16, 6:54 PM, "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
>> wrote:
>> 
>>> Here are responses to some of the points that Holder makes.
>>>
>>> peter
>>>
>>>
>>> 2/ The current SHACL syntax does not nicely handle some common
>>>examples.
>>>
>>> Consider a shape limiting a person's guru to be both a person and a
>>> preacher.  The Simplest current Way of doing this is something like
>>>  ex:foo a sh:Shape ;
>>>   sh:property [ sh:predicate ex:guru ;
>>>   	       	 sh:class ex:Person ] ;
>>>   sh:property [ sh:predicate ex:guru ;
>>>   	       	 sh:class ex:Preacher ] .
>>> In my proposal this would be
>>>  ex:foo a sh:Shape ;
>>>   sh:property ( ex:guru [ sh:class ex:Person; sh:class ex:Preacher ] )
>>>.
>>> The current syntax results in shapes that are harder to analyze by
>>>tools.
>>>
>>> Consider a shape limiting the form of a SSN.  Right now this requires
>>> something like
>>>  ex:foo a sh:Shape ;
>>>   sh:property [ sh:predicate ex:guru ;
>>>   	       	 sh:pattern "[0-9]*" ] .
>>> My proposal is very similar
>>>  ex:foo a sh:Shape ;
>>>   sh:property ( ex:guru [ sh:pattern "[0-9]*" ] ) .
>>> However, to figure out what is going on in the current syntax requires
>>> looking for the flags property, also not so simple for tools.
>>>
>>> 5/ Merging constraints and shapes does not limit the places where
>>>severity
>>> and other information can be attached.
>>>
>>> 9,10/ I agree that paths add a lot of complication both for
>>>implementing
>>> constraints and for other toos.  I added them to see how complex they
>>> would
>>> be.  The proposal does not depend on paths.  I will indicate where the
>>> changes would be.
>>>
>>> 11/ Even though RDF requires that subjects of triples are not literals,
>>> there is no reason to forbid literal-only constructs in places where
>>> literals can not appear.  For conforming RDF graphs these will always
>>>be
>>> false but for extended RDF graphs they will do the right thing.  SPARQL
>>> itself permits literals as the subject of triple patterns so that it
>>>will
>>> work well with extended RDF graphs.
>>>
>>> 12/ If a construct like sh:minCount needs to know whether it is in the
>>> object of a sh:property or an sh:inverseProperty, then that is
>>>problematic
>>> in the current situation.  How is it to know?
>>>
>>> 13/ In sh:fillers ( ex:property [ sh:minCount 1 ] ), the sh:minCount
>>>does
>>> work alone, without knowledge about its context.  All that it is saying
>>> that
>>> there is at least one of whatever.  If an implementation needs to know
>>> about
>>> the context then that is something to be fixed or worked around.
>>>
>>> 17/ Optional parameters complicate matters no matter how they are set
>>>up.
>>> Putting the parameters together requires unpacking them.  Having two
>>> properties requires finding them both.
>>>
>>> 19,20/ These appear to be implementation issues.
>>>
>>> 22/ I believe that Arthur was against using metaclasses in the
>>>metamodel,
>>> i.e., that a template was a metaclass and a constraint was a class.  In
>>> this
>>> proposal templates are classes, not a metaclass.  Templates are also
>>> properties, so the IRI of the template can be directly used in shapes.
>>> This
>>> has the added benefit of tieing the template property to the template.
>>> In
>>> the current setup instead separate bits of the template are required to
>>> state which properties carry its meaning.  This has a problem if two
>>> templates use the same properties.  Which template is to be used then?
>>>
>>> 23/ I do not limit Functions to a single argument.  The information
>>>passed
>>> in is the list of the arguments, which can be split up in the SPARQL
>>>code.
>>>
>>> 25/ I am not using list positions to "encode logic".  I am using list
>>> positions for syntax, so as to make the syntax more compact.  Even most
>>> logics uses list positions in their syntax.  If compactness is not a
>>> desirable feature then changing to a object-like syntax is simple.
>>>
>>> 26/ Conceptually an expression like sh:minCount does not need to work
>>>on a
>>> focus node + path combination.  All that it really needs to know is the
>>> fillers of the path so that it can count them.
>>>
>>>
>>> On 03/10/2016 03:10 AM, Holger Knublauch wrote:
>>>> I took a reasonably in-depth look at
>>>>
>>>>
>>>> 
>>>>https://www.w3.org/2014/data-shapes/wiki/ISSUE-95:_Metamodel_simplifica
>>>>ti
>>>> ons#Proposal_4
>>>>
>>>>
>>>> and below is my feedback.
>>>>
>>>> Summary: I don't regard anything in this proposal as an improvement
>>>>over
>>>> proposal 3. IMHO it presents a massive step backwards for both users
>>>>of
>>>> the
>>>> core language and the advanced features. If there are ideas worth
>>>> harvesting
>>>> then these should be raised and examined individually. I support
>>>> re-opening
>>>> ISSUE-41 as suggested by Simon for the paths topic, and to generalize
>>>> sh:and/or/not so that they can directly point at sh:Constraints
>>>>instead
>>>> of
>>>> just shapes.
>>>>
>>>> HTH
>>>> Holger
>>>>
>>>>
>>>> General Problems
>>>>
>>>> 1) Proposal 4 is poorly motivated. As Peter stated himself, he started
>>>> this
>>>> effort to simplify the metamodel. He made changes to the end-user
>>>> visible
>>>> syntax in order to "simplify" the metamodel. However, there was no
>>>> problem
>>>> with the end-user visible syntax to begin with. There was no need to
>>>> change
>>>> it, and the new syntax is a step backwards. The metamodel is far less
>>>> important than the user-facing syntax.
>>>>
>>>> 2) The syntax changes seem to reflect Peter's world view that SHACL
>>>> should
>>>> only be a constraint checking language, not used to describe data or
>>>> even as
>>>> "a modeling language". The syntax changes have made the model less
>>>> predictable, and harder to use by algorithms such as form builders,
>>>> without
>>>> adding expressivity for constraint checking.
>>>>
>>>> 3) There is no experience with this syntax. We need to redo all
>>>> evaluation,
>>>> repeat experiments, even revisit every single already closed ISSUE
>>>> whether it
>>>> is still valid under the new approach. External observers of SHACL
>>>>will
>>>> be
>>>> upset that we made such changes so relatively late in the process.
>>>>Such
>>>> a
>>>> drastic change will set us back by months. We'll likely need another
>>>> face to
>>>> face meeting. The arguments to justify all this are extremely weak.
>>>> Meanwhile
>>>> we will be losing a lot of time just debating something that I
>>>>consider
>>>> a
>>>> non-starter. It would be much more productive to look at some key
>>>> aspects of
>>>> where Peter believes we could do better and work on incremental
>>>> improvements,
>>>> i.e. harvest some ideas that we agree on, instead of creating a
>>>> completely new
>>>> language.
>>>>
>>>>
>>>> On merging Shapes and Constraints
>>>>
>>>> 4) There is nothing conceptually difficult about the current
>>>>metamodel,
>>>> and
>>>> there was no need to change it. Shapes are a collection of constraints
>>>> and
>>>> define a scope. Constraints restrict the focus node, possibly
>>>>following
>>>> properties. That's basically it. Shapes are similar to class
>>>> definitions and
>>>> intuitive to understand for most people. Merging these concepts blurs
>>>> the
>>>> lines, for no convincing reason. I expect that future use cases of
>>>> Shapes will
>>>> involve rules via a property such as shr:rule. Shapes serve as an
>>>> entity to
>>>> group focus nodes, and this role is independent of constraints.
>>>>
>>>> 5) If Shapes are constraints then we are just repeating the same
>>>> mistake with
>>>> making sh:closed an attribute of the shape: We lose the ability to
>>>> specify
>>>> severity and other things. Basically, it has become impossible (or
>>>> arcane) to
>>>> specify different (node) constraints with different severity. For
>>>>this,
>>>> constraints need to be objects attached to the shape. Alternatively
>>>> you'd need
>>>> shapes pointing at sub-shapes, but then you end up with different
>>>> syntaxes for
>>>> the same thing.
>>>>
>>>> 6) If the main motivation for linking shapes and constraints was
>>>> syntactic
>>>> sugar, then we could make plenty of other incremental changes, such as
>>>> allowing the values of sh:and/sh:or to be sh:NodeConstraints, not just
>>>> Shapes,
>>>> or generalize sh:valueShape into sh:valueConstraint, pointing at
>>>> constraints
>>>> directly.
>>>>
>>>>
>>>> On property/inverseProperty vs generalized paths
>>>>
>>>> 7) Paths can already be handled (in a very controlled form) using
>>>> sh:valueShape and derived values.
>>>>
>>>> 8) The syntax for inverse properties becomes very ugly and
>>>>inconsistent
>>>> with
>>>> how forward properties are represented:
>>>>
>>>> ex:MyShape
>>>>     sh:fillers ( [ sh:inverse ex:parent ] [ sh:minCount 1 ] ) ;
>>>>     sh:fillers ( ex:parent [ sh:minCount 1 ] ) .
>>>>
>>>> 9) Path expressions cause a lot of new complexity, computationally,
>>>> syntactically, for SPARQL generation etc.
>>>>
>>>> 10) Path expressions make static analysis (for things like form
>>>> generation and
>>>> structural checking of a shapes model) almost impossible. If an
>>>> arbitrary path
>>>> can show up where we previously only had simple predicates, then a lot
>>>> of
>>>> extra checking and branching needs to happen to make sense of the
>>>> situation.
>>>>
>>>> 11) It is incorrect to claim that all constraint types can be used in
>>>> combination with every path. For example, sh:minInclusive does not
>>>> apply to
>>>> inverse properties. The current metamodel and proposal 3 can express
>>>> this
>>>> using standard techniques (classes such as
>>>> sh:InversePropertyConstraint), but
>>>> Proposal 4 throws everything together and this ability is lost. As a
>>>> result,
>>>> tools cannot provide guidance about which values can actually be
>>>> entered when.
>>>>
>>>> 12) Some constraint types require different SPARQL queries (or
>>>> JavaScript or
>>>> whatever) depending on the direction of a property (or even worse, for
>>>> an
>>>> arbitrary path). For example sh:minCount needs to count subjects
>>>>versus
>>>> objects. Proposal 4 does not even talk about this and no example of
>>>> SPARQL
>>>> generation is given. Not all constraint types are of the simple
>>>> allValuesFrom
>>>> pattern implemented by NodeValidationFunctions.
>>>>
>>>> 13) In cases like sh:fillers ( ex:property [ sh:minCount 1 ] ) the
>>>> "shape"
>>>> with the minCount is no longer working stand-alone, but it requires
>>>> knowledge
>>>> about its context (e.g. the specific path that was used) to work
>>>> correctly.
>>>> This is unclear and adds unnecessary complexity. It is an unnecessary
>>>> construct to have objects that change their meaning depending of their
>>>> parent
>>>> resource.
>>>>
>>>>
>>>> On the constraint types limited to a single property only
>>>>
>>>> 14) This is a particularly poorly motivated change that goes
>>>>backwards:
>>>> in
>>>> order to accommodate a "simplification" of the metamodel, the syntax
>>>>was
>>>> changed and an unfounded claim is used that "multiple parameters are a
>>>> poor
>>>> syntax". The example in ISSUE-133 is skewed to give the impression
>>>>that
>>>> a real
>>>> problem exists:
>>>>
>>>> [ a sh:Propertyonstraint ;
>>>>     sh:pattern "http:*" ;
>>>>     sh:predicate ex:httpURL ;
>>>>     sh:datatype xs:string ;
>>>>     sh:minCount 1 ;
>>>>     sh:maxCount 1 ;
>>>>     sh:flags "i" ]
>>>>
>>>> If your concern is readability of the source code, why would anybody
>>>>put
>>>> sh:pattern and sh:flags so far apart? This is ridiculous. Just write
>>>>
>>>> [ a sh:Propertyonstraint ;
>>>>     sh:pattern "http:*" ;  sh:flags "i" ;
>>>>     sh:predicate ex:httpURL ;
>>>>     sh:datatype xs:string ;
>>>>     sh:minCount 1 ;
>>>>     sh:maxCount 1 ]
>>>>
>>>> and problem solved. If you are not editing the Turtle, then of course
>>>> it is a
>>>> matter of tool support, and any reasonable tool will of course group
>>>> those
>>>> parameters visually together. We even have sh:group and sh:order
>>>> attributes
>>>> for those purposes, and the ConstraintTypes bundle together their
>>>> parameters
>>>> in Proposal 3. The same information can (and will) be used by editing
>>>> tools
>>>> that write Turtle files.
>>>>
>>>> 15) With single-parameter constraint types, and the need to use
>>>>reified
>>>> objects or list parameters whenever you need to pass in multiple
>>>>values
>>>> instead, the labeltemplate and sh:message templates become useless as
>>>> there is
>>>> no general mechanism to access the nested parameter values. They just
>>>> become
>>>> random objects and lists.
>>>>
>>>> 16) If multiple parameters are needed, the problem of defining and
>>>> using them
>>>> is just shifted by one level. For example, proposal 3 has a uniform
>>>>and
>>>> integrated syntax to define parameters. If you just point at an object
>>>> then
>>>> you need to talk (elsewhere) about the constraints on those objects.
>>>> This is
>>>> inconsistent, verbose, unmaintainable and not user friendly at all.
>>>>
>>>> 17) There is no uniform syntax for parameters anymore. Some are just
>>>> plain
>>>> values, others are lists, others are objects. Consider the case of
>>>> sh:pattern.
>>>> In Proposal 4, the values of sh:pattern are either a string or a list
>>>> where
>>>> the first value is a string and the second another string, with a
>>>> different
>>>> meaning. Imagine having to write code, editors or even a SPARQL query
>>>> for
>>>> that. You'll end up with complicating UNIONs and ORs everywhere just
>>>>to
>>>> handle
>>>> the variations due to the metamodel "simplifications".
>>>>
>>>> 18) If you need parameter objects to pass in multiple logical
>>>> parameters, then
>>>> you basically *always* need access to the $shapesGraph. Peter was
>>>> strongly
>>>> against this for ages, and made a lot of noise about that. Now he has
>>>> completely reverted his position, just to accommodate his
>>>> "simplification",
>>>> and to even make it possible at all.
>>>>
>>>> 19) If you need parameter objects to pass in multiple values, every
>>>> SPARQL
>>>> implementation of such a constraint type will first need to start with
>>>> a block
>>>> to retrieve all the real parameters that are nested in the object or
>>>> list.
>>>> Compare:
>>>>
>>>> WHERE {
>>>>     GRAPH $shapesGraph {
>>>>         $myParam ex:value1 ?value1 .
>>>>         OPTIONAL {
>>>>             $myParam ex:value2 ?value2 .
>>>>         }
>>>>     }
>>>>     $this $predicate ?object .
>>>>     FILTER (doSomething(?object, ?value1) || (bound(?value2) &&
>>>> soSomethingElse(?object, ?value2))
>>>> }
>>>>
>>>> versus the current syntax:
>>>>
>>>> WHERE {
>>>>     $this $predicate ?object .
>>>>     FILTER (doSomething(?object, $value1) || (bound(?value2) &&
>>>> soSomethingElse(?object, $value2))
>>>> }
>>>>
>>>> 20) Related to point 19) above, you will have a combinatorial
>>>>explosion
>>>> of
>>>> parameters if you have multiple OPTIONAL blocks. This will sometimes
>>>> require
>>>> nested SELECT DISTINCTs etc.
>>>>
>>>> 21) Proposal 4 separates the "shape" of a constraint type from its
>>>> actual
>>>> definition. This is verbose and harder to maintain. Proposal 3 handles
>>>> this
>>>> much more elegantly, where the constraint type itself doubles as a
>>>> shape, and
>>>> sh:parameter is basically a property constraint (pending the choice of
>>>> various
>>>> options). No need for separate shapes.
>>>>
>>>> 22) sh:ComponentTemplate in Proposal 4 mixes rdf:Property and
>>>>sh:Shape.
>>>> One of
>>>> the main points of criticism from Arthur (and others I believe) was
>>>> that my
>>>> proposal used metaclasses. Here something very similar happens again.
>>>>
>>>> 23) Show stopper: Proposal 4 also limits Functions to just a single
>>>> parameter,
>>>> and claims that parameter objects can be passed into the function
>>>> instead.
>>>> This is not working, because it is not practically possible to
>>>> manipulate the
>>>> shapes graph prior to every function invocation. For example
>>>> ex:myFunction(2,
>>>> 3) would become ex:myFunction(ex:args) where [ ex:args sh:arg1 2 ;
>>>> sh:arg2 3
>>>> ]. This cannot work for cases such as ex:myFunction(2, ?value). Fixing
>>>> this
>>>> would cause an inconsistency in the way that functions vs other
>>>> parameterizables are defined. Proposal 3 handles all these
>>>>consistently.
>>>>
>>>>
>>>> Miscellaneous
>>>>
>>>> 24) The new syntax is not more user friendly at all, e.g. the
>>>>proximity
>>>> of
>>>> sh:fillers vs sh:filter. What is a "filler" anyway? The existing
>>>>syntax
>>>> from
>>>> Proposal 3 is very similar to Resource Shapes and OWL (restrictions),
>>>> both
>>>> have user experience and there was no need to switch to something like
>>>> sh:fillers.
>>>>
>>>> 25) Show stopper: Using list positions to encode logic is a very bad
>>>> anti-pattern. The syntax
>>>>
>>>>     sh:fillers ( ex:myProperty [ sh:minCount 1 ] )
>>>>
>>>> may superficially look more compact, but it violates any established
>>>> design
>>>> pattern in either RDF or object-orientation. If something is a "path",
>>>> then
>>>> call it "path" in the data model. If something is a shape then call it
>>>> such,
>>>> even if the Turtle becomes a bit longer:
>>>>
>>>>     sh:fillers [ sh:path ex:myProperty ; sh:shape [ sh:minCount 1 ] )
>>>>.
>>>>
>>>> Just for the sake of it, following this "design pattern" someone could
>>>> model a
>>>> Person record as an rdf:List:
>>>>
>>>>     (   "John"
>>>>         "Doe"
>>>>         "1971-07-07"^^xsd:date
>>>>         ex:USA )
>>>>
>>>> Following your approach, if someone has multiple first names, make a
>>>> nested list
>>>>
>>>>     ( ("John" "Edward" )
>>>>         "Doe"
>>>>         "1971-07-07"^^xsd:date
>>>>         ex:USA )
>>>>
>>>> The "beauty" of your syntax fades quickly if you ever use this in
>>>>other
>>>> formats such as JSON-LD:
>>>>
>>>>     [ [ "John", "Edward" ],
>>>>         "Doe",
>>>>         { "@value" : "1971-07-07", "@type" :
>>>> "http://www.w3.org/2001/XMLSchema#date" },
>>>>         { "@id" : "ex:USA" } ]
>>>>
>>>> The problem here is that lists don't allow you to create @contexts. A
>>>> better
>>>> JSON-LD syntax, using normal named properties instead of lists would
>>>>be:
>>>>
>>>>     { "firstNames": [ "John", "Edward" ],
>>>>         "lastName" : "Doe",
>>>>         "dob" : "1971-07-07",
>>>>         "country": "ex:USA" ]
>>>>
>>>> So, creating an RDF vocabulary just so that it looks good in Turtle is
>>>> a very
>>>> bad idea. While the Person example above is for illustration purposes,
>>>> the
>>>> same issue happens for every sh:filler scenario and will happen with
>>>> custom
>>>> extensions too.
>>>>
>>>> Needless to say, such rdf:Lists are almost impossible to use in SPARQL
>>>> or any
>>>> query-based approach.
>>>>
>>>> 26) The claim that a simple sh:sparqlTemplate per componentTemplate is
>>>> sufficient is incorrect, because some templates need to operate on the
>>>> results
>>>> of path expressions (e.g. sh:class) while others need to look at the
>>>> full
>>>> focus node + path combination. There is no vocabulary to encode these
>>>> differences that could be used by an implementation. It would require
>>>>a
>>>> novel
>>>> text-insertion mechanism for things like "insert path here".
>>>>
>>>> 27) The SPARQL behind these templates cannot be reused in other SPARQL
>>>> queries, unlike sh:NodeValidationFunctions.
>>>>
>>>>
>>>
>> 
>>
Received on Friday, 11 March 2016 18:09:28 UTC