Proposal for "Repeated Property" Requirement - sh:partition from Arthur Ryman on 2015-09-25 (public-data-shapes-wg@w3.org from September 2015)

From: Arthur Ryman <arthur.ryman@gmail.com>
Date: Fri, 25 Sep 2015 19:13:26 -0400
To: "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
Message-ID: <CAApBiO=2Bq3Fw6mjn3V-zB=FvtAW26Ui8raKcW0v=ASfd1UD5g@mail.gmail.com>

I've been following the discussion about repeated properties and
qualified cardinality constraints, and would like to propose a new
SHACL language element, sh:partition, that I believe will satisfy the
requirements.

I the use cases suggest that SHACL needs a way to say that a set of
nodes must be partitioned into a certain number of disjoint subsets.
Each subset contains nodes that satisfy certain constraints. Each
subset must satisfy certain cardinality constraints.

In the case of repeated properties, we are looking at the set of all
values for a given property (or inverse property) of a given focus
node. Sets of nodes occur in other contexts and be need to be
similarly constrained.

It would be a very good thing if a SHACL processor could efficiently
determine if a given set of nodes could be partitioned according to a
given partition spec.

SHACL already has sh:minCount and sh:maxCount properties which apply
to sets of nodes.

SHACL also already has many other properties that define constraints
on a given node. These are tests or checks that apply to a node and
are either true or false. Holger listed many of them, e.g.
- sh:allowedValues
- sh:class
- sh:datatype
- sh:directType
- sh:minLength
- sh:maxLength
- sh:nodeKind
- sh:maxExclusive etc
- sh:pattern

I propose to define a new RDF type, sh:QCC for things that specify
qualified cardinality constraints. However, sh:QCC will normally be
understood from the context and do not need to appear explicitly in
the shapes graph.

A sh:QCC may have:
- zero or one sh:minCount
- zero or one sh:maxCount
- zero or more node constraints, for the following list (and possibly
others that make sense)
- sh:shape
- sh:allowedValues
- sh:class
- sh:datatype
- sh:directType
- sh:minLength
- sh:maxLength
- sh:nodeKind
- sh:maxExclusive etc
- sh:pattern

A partition is specified by an rdf:List of sh:QCC nodes. Define
sh:Partition to be this subclass of rdf:List. Again, sh:Partition need
no appear explicitly.
A constraint may have zero or more sh:partition properties whose
values are sh:Partition nodes. All must be satisfied.

The interpretation of a sh:Partition node as a constraint is as follows:

Let the given set of nodes be X.
Let the sh:Partition node be the list P = (qcc1, qcc2, ..., qccn).

For each qcc in P do the following:
   Let Y be the subset of X that satisfies the node constraints in qcc.
   If Y violates the cardinality constraints of qcc then report a
violation and break.
   Otherwise remove Y from X and continue.
End for.
If X is not empty then report a violation.
Otherwise report that P is satisfied.

Note that this is a greedy algorithm. Each qcc in the list is matched
to the fullest extent. Nodes that match one qcc are removed from
further consideration. Also, the qcc's are checked in the order given
in the list so there is no combinatorial explosion.

Eric proposed the following example [1]:

<BFPersonInterface1> sh:property [
      sh:predicate bf:identifiedBy ; sh:pattern "^http://id.loc.gov/" ;
      sh:minCount 1 ; sh:maxCount 1
    ], [
      sh:predicate bf:identifiedBy ; sh:pattern "^http://viaf.org/" ;
      sh:minCount 1
    ] .

In my proposal, this becomes:

<BFPersonInterface1> sh:property [
      sh:predicate bf:identifiedBy ;
      sh:partition (
         [sh:pattern "^http://id.loc.gov/" ; sh:minCount 1 ; sh:maxCount 1],
         [sh:pattern "^http://viaf.org/" ; sh:minCount 1 ]
      ) .

[1] https://lists.w3.org/Archives/Public/public-data-shapes-wg/2015Sep/0107.html

-- Arthur

Received on Friday, 25 September 2015 23:13:54 UTC