Re: minor syntax fixes

Multiple datatypes would never be satisfied, so the only reason for such shape to be created is in error. Without the rule, such shapes would be syntactically correct, but semantically not what a user wanted to say. If they are generated by a program, there is something wrong with the logic of the program that needs to be investigated and fixed. 

Multiple cardinalities and lengths in a shape are too easy for people to misunderstand, leading to differences in the expectations among data stakeholders. The whole point of data validation is to ensure data interoperability and integration in a way that meets business needs. People are an important part of that process. Any rules about data get checked and double checked. Thus, they must not be confusing to people. If people creating data expect different things, there is not going to be data interoperability that meets the needs. It may all seemingly work, but not in the way expected or needed.

All these conditions can be detected by SHACL-SHACL. It is quite easy for any program that generated them to prune them out e.g., remove the smaller maxCount and maxLength. However, depending on the situation a program may need to bring such exceptions to attention of people since they may indicate some problem (disagreement) in the sources that were used to generate shapes. In cases where a rule for what to prune out can’t be defined (as it is with datatypes), there definitely needs to be a human in the loop to make a decision.

> On Apr 23, 2017, at 6:53 PM, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
> 
> I spoke too soon in saying that there are no problems later on.
> 
> Disallowing redundant things like multiple sh:datatypes causes two problems.
> It makes it harder to generate valid SHACL.  It makes it easier to trigger
> interoperability problems.
> 
> It is even the case that some of these newly disallowed shapes do not have
> redundant constraints.
> 
> peter
> 
> 
> On 03/22/2017 07:48 AM, Peter F. Patel-Schneider wrote:
>> The changes in response to everything except my first point look good.
>> 
>> For the first point:
>> 
>> The inability to have other properties on some, but only some, nodes in
>> property paths is going to cause many problems.  There can't be comments on
>> these nodes, there can't be pointers to definitions or sources, there can't
>> be display information, and so on, and so on, and so on.  Users will
>> regularly stumble over this quirk in the syntax; user guides will have to
>> discuss it; implementors will have to figure out what to do in non-strict
>> mode.  Fixing the problem now has a lower cost than dealing with it later
>> and a much lower cost than not dealing with it at all.
>> 
>> This part of the definition of SHACL property paths causes another problem.
>> It makes strange property paths legal, such as
>>  _:pp rdfs:comment "Well-formed Property Path!" ;
>>    sh:inversePath ex:p1 ;
>>    rdf:first ex:p2 ;
>>    rdf:rest ( ex:p3 ) .
>> 
>> So right now there are property paths that should be legal but are not and
>> property paths that should not be legal but are.  This is not a good
>> situation at all and needs to be fixed.
>> 
>> 
>> Peter F. Patel-Schneider
>> Nuance Communications
>> 
>> On 03/21/2017 06:11 PM, Holger Knublauch wrote:
>>> Hi Peter,
>>> 
>>> On 22/03/2017 8:57, Peter F. Patel-Schneider wrote:
>>>> There are several minor problems with the SHACL Core syntax that need to be
>>>> fixed.  Most of the problems make shapes graphs illegal that should be
>>>> legal.
>>> 
>>> "need to be fixed" is IMHO a bit strong, but I can live with "should be fixed".
>>> 
>>>> 
>>>> 
>>>> The syntax of paths is too restrictive as it disallows extra triples on many
>>>> (but not all) path nodes.  This forbids comments in useful places, like in
>>>> the SHACL property path below
>>>> 
>>>> ex:PathShape rdf:type sh:PropertyShape ;
>>>>  rdfs:comment "Inverse of p has to be C" ;
>>>>  sh:path [ rdfs:comment "Inverse Path" ; sh:inversePath ex:p ] ;
>>>>  sh:class ex:C .
>>>> 
>>>> The wording in the ASHACL Property Paths section of
>>>> https://arxiv.org/abs/1702.01795 permits extra triples on path nodes and
>>>> thus provides a better definition for SHACL property paths.
>>> 
>>> There are pros and cons of such a change. Yes, theoretically someone may want
>>> to attach extra triples, but is this really a strong requirement? I'd argue it
>>> should be sufficient to put such a comment into the surrounding node. A
>>> downside of allowing this is that it raises the costs for development tools:
>>> in our own products we allow users to enter paths in SPARQL 1.1 surface
>>> syntax, and if we allow other triples in there then people may expect those to
>>> be preserved which would be prohibitively expensive.
>>> 
>>> As I am not convinced of the benefits and due to the late stage in the process
>>> (and the risk of introducing regression bugs) I would prefer to keep the path
>>> syntax as-is.
>>> 
>>> 
>>> I do agree with most of the other rules below, as they would allow us to
>>> define more narrow meta-shapes than we would otherwise have to implement. Note
>>> all this is in the nice-to-have category, so if anyone sees issues here, I'd
>>> be happy to roll back.
>>> 
>>>> 
>>>> 
>>>> Some syntax checks go beyond checking syntax of information associated with
>>>> shapes.  They should only be performed on shapes.
>>>> 
>>>> This makes odd but harmless triples illegal, such as
>>>> 
>>>> ex:n3 sh:severity 7 .
>>>> 
>>>> ex:n5 sh:message 0 .
>>>> 
>>>> ex:n4 sh:deactivated 0 .
>>>> 
>>>> ex:n1 sh:path ex:p1 , ex:p2 .
>>>> 
>>>> ex:n2 sh:path [ rdfs:comment "Not a path" ] .
>>>> 
>>>> The following changes to the syntax rules will fix these problems.
>>>> 
>>>> severity-nodeKind     Each value for sh:severity in a shape is an IRI.
>>> 
>>> Done.
>>> 
>>>> 
>>>> message-datatype     Each value for sh:message in a shape is either
>>>>            a xsd:string literal or a literal with a language tag.
>>> 
>>> Done, but note that sh:message is also allowed for SPARQL-based constraints so
>>> I had to formalize a similar syntax rule there.
>>> 
>>>> 
>>>> deactivated-datatype     Each value for sh:deactivated in a shape is
>>>>            either true or false.
>>> 
>>> Done.
>>> 
>>>> 
>>>> path-maxCount     A shape has at most one value for sh:path.
>>> 
>>> Done.
>>> 
>>>> 
>>>> path-shape     Each value for sh:path in a shape is a well-formed
>>>>        SHACL property path.
>>> 
>>> Done, although I have kept "must be a ..." because some users may interpret it
>>> otherwise as "each value of sh:path is well-formed".
>>> 
>>>> 
>>>> 
>>>> One syntax rule is missing, allowing some misleading syntax that should be
>>>> disallowed, as in
>>>> 
>>>> ex:s1 sh:uniqueLang true, false .
>>>> 
>>>> The following addition to the syntax rules will fix this problem.
>>>> 
>>>> uniqueLang-maxCount     A shape has at most one value for sh:uniqueLang.
>>> 
>>> Done, specifically for property shapes (node shapes cannot have these values
>>> anyway). Note that there are several other constraint components that really
>>> only should have one value, e.g. sh:datatype. I would argue that there is just
>>> a small number of the core components where multiple values make sense
>>> (sh:class, sh:property and sh:node and the logical operators). So for now I
>>> have added similar maxCount=1 rules to
>>> 
>>> - sh:datatype
>>> - sh:nodeKind
>>> - sh:minCount
>>> - sh:maxCount
>>> - sh:min/max/in/exclusive
>>> - sh:minLength
>>> - sh:maxLength
>>> - sh:languageIn
>>> - sh:in
>>> 
>>> Background is that I expect many people to try to use multiple sh:datatype
>>> values to express "or" (already happened). By having those restrictions in
>>> place, tools can help users avoid these mistakes.
>>> 
>>> I will have this discussed in tonight's WG meeting too to double-check if
>>> anyone sees problems with this change. Meanwhile if anyone sees problems (real
>>> use cases that are no longer allowed) with the changes above, let me know ASAP.
>>> 
>>>> 
>>>> 
>>>> Many syntax rules state that they are for any node but it turns out that
>>>> they can be stated for shapes only without making any change in SHACL
>>>> syntax.  Changing these rules to be for shapes results in a more natural set
>>>> of rules.  The rules in quesstion look like
>>>>   Each value of XXX is ...
>>>> or
>>>>   The values of XXX are ...
>>>> where XXX is a parameter or a property related to targetting.  They can be
>>>> changed to
>>>>   Each value for XXX in a shape is ...
>>> 
>>> All done (hopefully), see
>>> 
>>> https://github.com/w3c/data-shapes/commit/d92aaa7b6fd2e7a196e01fec8f19be7d385a5ae1
>>> 
>>> 
>>> Would appreciate double-checking as we are trying to move to CR soon.
>>> 
>>> Thanks,
>>> Holger
>>> 
>>> 
> 

Received on Monday, 24 April 2017 00:11:01 UTC