propose to make repeated-properties additive

I have been working with Karen Coyle and Tom Baker of Dublin Core on
the following critique of the drafted behavior of repeated properties
in SHACL. We discussed usability issues and lessons learned with
respect to Description Set Profile semantics.



In 2008, DCMI did an analysis evaluating the Scholarly Works
Application Profile (SWAP DSP), a deliverable of a UK project led by
UKOLN, for conformance with DCMI's then-current model for DSPs. They
tested whether the DSPs written by modelers matched their intended
semantics.

The DSPs failed to behave as modelers' expected; modelers were using
generic properties like dc:type multiple times with the expectation
that each constraint would correspond to one triple in the graph. An
example of this is the resuse of dc:type within a description of an
expression of an Eprint (it's a library thing). The SWAP DSP for
this included two dc:type arcs with values of <Expression>¹ and
<JournalArticle>².

¹ http://www.ukoln.ac.uk/repositories/digirep/index/Scholarly_Works_Application_Profile#Entity_type_2
² http://www.ukoln.ac.uk/repositories/digirep/index/Scholarly_Works_Application_Profile#Type


The SWAP DSP was found not to conform to the guidelines, because the
guidelines specified a matching algorithm whereby each statement in
the data was assumed to match just one statement template with a
given property constraint. In SWAP, a single property (dc:type) was
used in two different templates for statements describing the same
resource, ¹ and ² above.

At the time, DCMI saw this result as a problem more of the matching
algorithm than of the SWAP DSP itself. It is not all that uncommon
in the community of DC users for a property to be used with
different constraints. To assume otherwise, at any rate, seemed
needlessly restrictive.

Repeated properties are common in Cultural Heritage data, one of the
communities served by DCMI, and these communities are actively
converting their models to make use of RDF. One of the more
promising models is coming from the Library of Congress, and is
called BIBFRAME. First, consider an example from Bibframe where each
bf:Person must have only one bf:identifiedBy and it must come from
id.loc.gov:

  Instance data example:
  <bf_Person1>
    bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961> .

  ShEx:
    <BFPersonInterface1> {
      bf:identifiedBy IRI PATTERN "^http://id.loc.gov/"
    }

  SHACL:
    <BFPersonInterface1> sh:property [
      sh:predicate bf:identifiedBy ; sh:pattern "^http://id.loc.gov/"
    ] .


Another option in BIBFRAME is to have one or more additional
bf:identifiedBys coming from another source list, such as viaf.org:

  Instance data example:
  <bf_Person1>
  bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ;
  bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> .

  ShEx:
    <BFPersonInterface1> {
      bf:identifiedBy IRI PATTERN "^http://id.loc.gov/" ,
      bf:identifiedBy IRI PATTERN "^http://viaf.org/"  +
    }

With SHACL, users have to remember to use "Qualified" counts and
because SHACL defaults to an open graph, the constraints below do
not prohibit additional bf:identifiedBy properties:

  SHACL:
    <BFPersonInterface1> sh:property [
        sh:predicate bf:identifiedBy ; sh:pattern "^http://id.loc.gov/" ;
        sh:minQualifiedCount 1 ; sh:maxQualifiedCount 1
      ], [
        sh:predicate bf:identifiedBy ; sh:pattern "^http://viaf.org/" ;
        sh:minQualifiedCount 1
      ] .

... would return true for the instance with two bf:identifiedBy
predicates, but would also erroneously match

<bf_Person1>
  bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ;
  bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> ;
  bf:identifiedBy "this is a mistake" . # should be an error

If we invent a new construct to handle this, say
      [
        sh:predicate bf:identifiedBy ; sh:patterns
          ("^http://viaf.org/" "^http://viaf.org/");
        sh:minQualifiedCount 1
      ]
, the cost of repeated properties is painfully high.

=PROPOSAL=

There are plenty of use cases for repeated properties. We propose that
the syntax for repeated property constraints be identical to that for
single property constraints, i.e. that

  <BFPersonInterface1> sh:property [
      sh:predicate bf:identifiedBy ; sh:pattern "^http://id.loc.gov/" ;
      sh:minCount 1 ; sh:maxCount 1
    ], [
      sh:predicate bf:identifiedBy ; sh:pattern "^http://viaf.org/" ;
      sh:minCount 1
    ] .

is matched by any node with arcs that satisfy each of the property
requirements.

pass:
  <bf_Person1>
    bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ;
    bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> .

  <bf_Person1>
    bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ;
    bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim,T> .

fail:
  <bf_Person1>   # missing id.loc.gov id
    bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> .

  <bf_Person1>   # unrecognized identifiedBy property
    bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ;
    bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> ;
    bf:identifiedBy "this is a mistake" .

  <bf_Person1>   # too many id.loc.gov ids
    bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ;
    bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWOXT> ;
    bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> .

> -- 
> Karen Coyle
> kcoyle@kcoyle.net http://kcoyle.net
> m: 1-510-435-8234
> skype: kcoylenet/+1-510-984-3600

-- 
-ericP

office: +1.617.599.3509
mobile: +33.6.80.80.35.59

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

There are subtle nuances encoded in font variation and clever layout
which can only be seen by printing this message on high-clay paper.

Received on Friday, 18 September 2015 12:53:50 UTC