- From: Eric Prud'hommeaux <eric@w3.org>
- Date: Fri, 18 Sep 2015 08:53:44 -0400
- To: public-data-shapes-wg@w3.org
- Cc: Karen Coyle <kcoyle@kcoyle.net>, Thomas Baker <tom@tombaker.org>, kai eckert <kai.eckert@informatik.uni-mannheim.de>
I have been working with Karen Coyle and Tom Baker of Dublin Core on the following critique of the drafted behavior of repeated properties in SHACL. We discussed usability issues and lessons learned with respect to Description Set Profile semantics. In 2008, DCMI did an analysis evaluating the Scholarly Works Application Profile (SWAP DSP), a deliverable of a UK project led by UKOLN, for conformance with DCMI's then-current model for DSPs. They tested whether the DSPs written by modelers matched their intended semantics. The DSPs failed to behave as modelers' expected; modelers were using generic properties like dc:type multiple times with the expectation that each constraint would correspond to one triple in the graph. An example of this is the resuse of dc:type within a description of an expression of an Eprint (it's a library thing). The SWAP DSP for this included two dc:type arcs with values of <Expression>¹ and <JournalArticle>². ¹ http://www.ukoln.ac.uk/repositories/digirep/index/Scholarly_Works_Application_Profile#Entity_type_2 ² http://www.ukoln.ac.uk/repositories/digirep/index/Scholarly_Works_Application_Profile#Type The SWAP DSP was found not to conform to the guidelines, because the guidelines specified a matching algorithm whereby each statement in the data was assumed to match just one statement template with a given property constraint. In SWAP, a single property (dc:type) was used in two different templates for statements describing the same resource, ¹ and ² above. At the time, DCMI saw this result as a problem more of the matching algorithm than of the SWAP DSP itself. It is not all that uncommon in the community of DC users for a property to be used with different constraints. To assume otherwise, at any rate, seemed needlessly restrictive. Repeated properties are common in Cultural Heritage data, one of the communities served by DCMI, and these communities are actively converting their models to make use of RDF. One of the more promising models is coming from the Library of Congress, and is called BIBFRAME. First, consider an example from Bibframe where each bf:Person must have only one bf:identifiedBy and it must come from id.loc.gov: Instance data example: <bf_Person1> bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961> . ShEx: <BFPersonInterface1> { bf:identifiedBy IRI PATTERN "^http://id.loc.gov/" } SHACL: <BFPersonInterface1> sh:property [ sh:predicate bf:identifiedBy ; sh:pattern "^http://id.loc.gov/" ] . Another option in BIBFRAME is to have one or more additional bf:identifiedBys coming from another source list, such as viaf.org: Instance data example: <bf_Person1> bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ; bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> . ShEx: <BFPersonInterface1> { bf:identifiedBy IRI PATTERN "^http://id.loc.gov/" , bf:identifiedBy IRI PATTERN "^http://viaf.org/" + } With SHACL, users have to remember to use "Qualified" counts and because SHACL defaults to an open graph, the constraints below do not prohibit additional bf:identifiedBy properties: SHACL: <BFPersonInterface1> sh:property [ sh:predicate bf:identifiedBy ; sh:pattern "^http://id.loc.gov/" ; sh:minQualifiedCount 1 ; sh:maxQualifiedCount 1 ], [ sh:predicate bf:identifiedBy ; sh:pattern "^http://viaf.org/" ; sh:minQualifiedCount 1 ] . ... would return true for the instance with two bf:identifiedBy predicates, but would also erroneously match <bf_Person1> bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ; bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> ; bf:identifiedBy "this is a mistake" . # should be an error If we invent a new construct to handle this, say [ sh:predicate bf:identifiedBy ; sh:patterns ("^http://viaf.org/" "^http://viaf.org/"); sh:minQualifiedCount 1 ] , the cost of repeated properties is painfully high. =PROPOSAL= There are plenty of use cases for repeated properties. We propose that the syntax for repeated property constraints be identical to that for single property constraints, i.e. that <BFPersonInterface1> sh:property [ sh:predicate bf:identifiedBy ; sh:pattern "^http://id.loc.gov/" ; sh:minCount 1 ; sh:maxCount 1 ], [ sh:predicate bf:identifiedBy ; sh:pattern "^http://viaf.org/" ; sh:minCount 1 ] . is matched by any node with arcs that satisfy each of the property requirements. pass: <bf_Person1> bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ; bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> . <bf_Person1> bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ; bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim,T> . fail: <bf_Person1> # missing id.loc.gov id bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> . <bf_Person1> # unrecognized identifiedBy property bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ; bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> ; bf:identifiedBy "this is a mistake" . <bf_Person1> # too many id.loc.gov ids bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ; bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWOXT> ; bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> . > -- > Karen Coyle > kcoyle@kcoyle.net http://kcoyle.net > m: 1-510-435-8234 > skype: kcoylenet/+1-510-984-3600 -- -ericP office: +1.617.599.3509 mobile: +33.6.80.80.35.59 (eric@w3.org) Feel free to forward this message to any list for any purpose other than email address distribution. There are subtle nuances encoded in font variation and clever layout which can only be seen by printing this message on high-clay paper.
Received on Friday, 18 September 2015 12:53:50 UTC