Re: ISSUE-92: Should repeated properties be interpreted as additive or conjunctive?

* Irene Polikoff <irene@topquadrant.com> [2017-01-12 00:14-0500]
> At the January 11th meeting we have attempted to close this issue by removing sh:partition, but have not reached a conclusion because attendees wanted to get more details on the issue. Since we are likely to look at this issue again during the next meeting, I thought it would be useful to summarize the details.
> 
> A typical use case that motivated the issue is as follows:
> 
> Dublin Core experience suggests that users expect multiple constraints on the same property to be "additive". For example
> 
> ex:BibframeShape 
> sh:property
> [ sh:predicate bf:identifiedBy ; sh:pattern "^http://id.loc.gov/ <http://id.loc.gov/>" ] ,
> [ sh:predicate bf:identifiedBy ; sh:pattern "^http://viaf.org/ <http://viaf.org/>" ] .
> 
> would be interpreted as requiring one bf:identifiedBy value starting
> with "http://id.loc.gov/ <http://id.loc.gov/>" and another starting with
> "http://viaf.org/ <http://viaf.org/>“ and no other values.
> 
> 
> The issue was first created in September 2015. At the time, there was a proposal on the table to have multiple constraints on the same predicate to be additive as outlined above. it was decided that the basic combination of constraints would not work in this way and be conjunctive. Thus, no data could ever satisfy the shape as written above since it will be interpreted as requiring every value of bf:identifiedBy to start with both, http://id.loc.gov/ <http://id.loc.gov/> and http://viaf.org/ <http://viaf.org/> . 
> 
> Since there are logical operators, one could instead say:
> 
> ex:BibframeShape
> sh:property [
> 
>                         sh:predicate bf:identifiedBy ;
> 
>            sh:minCount 2;
> 
>           sh:maxCount 2;
> 
>                         sh:or (
> 
>                                     [ sh:pattern "^http://id.loc.gov/”;]
> 
> 
>                                     [sh:pattern "^http://viaf.org/”; ]
> 
>                                      )
> 
>  ] .
> 
> 
> 
> This shape, however, doesn’t require one of each values. It simply says that the values must have either one of the pattern. With this, if we had one value  http://id.loc.gov/ <http://id.loc.gov/>value1 <http://id.loc.gov/value1> and another http://id.loc.gov/value2 <http://id.loc.gov/value2> , there would be no error.
> 
> To enforce one of each, we can use qualified value shapes as follows:
> 
> ex:BibframeShape
> 
> sh:property [
> 
>         sh:predicate bf:identifiedBy ;
> 
>         sh:minCount 2;
> 
>         sh:maxCount 2;
> 
>     ] ;
> 
> sh:property [
> 
>         sh:predicate bf:identifiedBy ;
> 
>         sh:qualifiedValueShape [sh:pattern "^http://id.loc.gov/” ] ;
> 
>         sh:qualifiedMinCount 1 ;
> 
>         sh:qualifiedMaxCount 1 ;
> 
>     ] ;
> 
> sh:property [
> 
>         sh:predicate bf:identifiedBy ;
> 
>         sh:qualifiedValueShape [sh:pattern "^ http://viaf.org/” ] ;
> 
>         sh:qualifiedMinCount 1 ;
> 
>         sh:qualifiedMaxCount 1 ;
> 
>     ] ;
> 
> 
> This satisfies the use case, but it is verbose.
> 
> sh;partition offers a simpler syntax in support of this use case:
> 
> ex:BibframeShape
> 
>             sh:property [
> 
>                         sh:predicate bf:identifiedBy ;
> 
>                         sh:partition (
> 
>                                     [sh:minCount 1; sh:maxCount 1; sh:pattern "^http://id.loc.gov/"]
> 
>                                     [sh:minCount 1; sh:maxCount 1; sh:pattern "^https://viaf.org/"]
> 
> 
>                         )
> 
> ] .

Originally, ShEx (a surface syntax for ResourceShape plus OR) didn't have partitions. It was added to:

  1 address the frequent (and encouraged) reuse of generic properties. These are extremely common in clinical data, for instance a BP observation has two components, one of which has a code for systolic and the other a code for diastolic.

  2 provide an OR that was closer to user expectations and reduced the need for defensive programming, e.g a shape expecting either a name or a given and family name should reject a mixture like { <s> foaf:name "X"; foaf:givenName "Y" }.

Partitioning was never a goal; it was a means of satisfying these needs.


> However, sh:partition uses lists and the order of resources within the list is significant. In general, if the members of the list are reordered, then different value node sets will be matched and different violation results will be reported. It was also seen as an overly complex feature to implement and likely to be poorly performing. With this, WG members had concerns about requiring that implementers must support it to comply with SHACL CORE. When this proposal was voted on, no one but Arthur Ryman, who proposed it, casted a positive vote. In addition to the majority of negative votes, proposal for sh:partition was also blocked by a -1 vote from Peter Patel-Schneider.
> 
> Since then, several people volunteered to work on improving the proposal including Arthur himself and, more recently, Eric Prud’hommeaux. They have not produced results in a form of an alternative, improved proposal.

My proposal was in the abstract syntax doc <http://github.com/w3c/data-shapes/shacl-abstract-syntax/satisfies-PartitionConstraint> though that disappeared within the last week.


> sh:partition is included in the spec, but it has been marked as a feature at risk in all editor drafts. 
> 
> The current proposal is to close the issue 92 ( https://www.w3.org/2014/data-shapes/wiki/index.php?title=Proposals#ISSUE-92:_additive_repeated_properties <https://www.w3.org/2014/data-shapes/wiki/index.php?title=Proposals#ISSUE-92:_additive_repeated_properties>) by removing sh:partition.  The use case is supported, although using a verbose syntax. It is not perfect, but the WG has no resources to solve the issues with sh:partition. So, unless there is someone in the WG or in the broader community who feels strongly about sh:partition and will volunteer to dedicate time on resolving the issues with it, I see no other option but to remove it.
> 
> Irene
> 

-- 
-ericP

office: +1.617.599.3509
mobile: +33.6.80.80.35.59

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

There are subtle nuances encoded in font variation and clever layout
which can only be seen by printing this message on high-clay paper.

Received on Thursday, 12 January 2017 11:00:18 UTC