- From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
- Date: Fri, 18 Sep 2015 08:37:55 -0700
- To: Eric Prud'hommeaux <eric@w3.org>, public-data-shapes-wg@w3.org
- Cc: Karen Coyle <kcoyle@kcoyle.net>, Thomas Baker <tom@tombaker.org>, kai eckert <kai.eckert@informatik.uni-mannheim.de>
Is there an issue to attach this proposal to? peter On 09/18/2015 05:53 AM, Eric Prud'hommeaux wrote: > I have been working with Karen Coyle and Tom Baker of Dublin Core on > the following critique of the drafted behavior of repeated properties > in SHACL. We discussed usability issues and lessons learned with > respect to Description Set Profile semantics. > > > > In 2008, DCMI did an analysis evaluating the Scholarly Works > Application Profile (SWAP DSP), a deliverable of a UK project led by > UKOLN, for conformance with DCMI's then-current model for DSPs. They > tested whether the DSPs written by modelers matched their intended > semantics. > > The DSPs failed to behave as modelers' expected; modelers were using > generic properties like dc:type multiple times with the expectation > that each constraint would correspond to one triple in the graph. An > example of this is the resuse of dc:type within a description of an > expression of an Eprint (it's a library thing). The SWAP DSP for > this included two dc:type arcs with values of <Expression>¹ and > <JournalArticle>². > > ¹ http://www.ukoln.ac.uk/repositories/digirep/index/Scholarly_Works_Application_Profile#Entity_type_2 > ² http://www.ukoln.ac.uk/repositories/digirep/index/Scholarly_Works_Application_Profile#Type > > > The SWAP DSP was found not to conform to the guidelines, because the > guidelines specified a matching algorithm whereby each statement in > the data was assumed to match just one statement template with a > given property constraint. In SWAP, a single property (dc:type) was > used in two different templates for statements describing the same > resource, ¹ and ² above. > > At the time, DCMI saw this result as a problem more of the matching > algorithm than of the SWAP DSP itself. It is not all that uncommon > in the community of DC users for a property to be used with > different constraints. To assume otherwise, at any rate, seemed > needlessly restrictive. > > Repeated properties are common in Cultural Heritage data, one of the > communities served by DCMI, and these communities are actively > converting their models to make use of RDF. One of the more > promising models is coming from the Library of Congress, and is > called BIBFRAME. First, consider an example from Bibframe where each > bf:Person must have only one bf:identifiedBy and it must come from > id.loc.gov: > > Instance data example: > <bf_Person1> > bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961> . > > ShEx: > <BFPersonInterface1> { > bf:identifiedBy IRI PATTERN "^http://id.loc.gov/" > } > > SHACL: > <BFPersonInterface1> sh:property [ > sh:predicate bf:identifiedBy ; sh:pattern "^http://id.loc.gov/" > ] . > > > Another option in BIBFRAME is to have one or more additional > bf:identifiedBys coming from another source list, such as viaf.org: > > Instance data example: > <bf_Person1> > bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ; > bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> . > > ShEx: > <BFPersonInterface1> { > bf:identifiedBy IRI PATTERN "^http://id.loc.gov/" , > bf:identifiedBy IRI PATTERN "^http://viaf.org/" + > } > > With SHACL, users have to remember to use "Qualified" counts and > because SHACL defaults to an open graph, the constraints below do > not prohibit additional bf:identifiedBy properties: > > SHACL: > <BFPersonInterface1> sh:property [ > sh:predicate bf:identifiedBy ; sh:pattern "^http://id.loc.gov/" ; > sh:minQualifiedCount 1 ; sh:maxQualifiedCount 1 > ], [ > sh:predicate bf:identifiedBy ; sh:pattern "^http://viaf.org/" ; > sh:minQualifiedCount 1 > ] . > > ... would return true for the instance with two bf:identifiedBy > predicates, but would also erroneously match > > <bf_Person1> > bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ; > bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> ; > bf:identifiedBy "this is a mistake" . # should be an error > > If we invent a new construct to handle this, say > [ > sh:predicate bf:identifiedBy ; sh:patterns > ("^http://viaf.org/" "^http://viaf.org/"); > sh:minQualifiedCount 1 > ] > , the cost of repeated properties is painfully high. > > =PROPOSAL= > > There are plenty of use cases for repeated properties. We propose that > the syntax for repeated property constraints be identical to that for > single property constraints, i.e. that > > <BFPersonInterface1> sh:property [ > sh:predicate bf:identifiedBy ; sh:pattern "^http://id.loc.gov/" ; > sh:minCount 1 ; sh:maxCount 1 > ], [ > sh:predicate bf:identifiedBy ; sh:pattern "^http://viaf.org/" ; > sh:minCount 1 > ] . > > is matched by any node with arcs that satisfy each of the property > requirements. > > pass: > <bf_Person1> > bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ; > bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> . > > <bf_Person1> > bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ; > bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim,T> . > > fail: > <bf_Person1> # missing id.loc.gov id > bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> . > > <bf_Person1> # unrecognized identifiedBy property > bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ; > bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> ; > bf:identifiedBy "this is a mistake" . > > <bf_Person1> # too many id.loc.gov ids > bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ; > bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWOXT> ; > bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> . > >> -- >> Karen Coyle >> kcoyle@kcoyle.net http://kcoyle.net >> m: 1-510-435-8234 >> skype: kcoylenet/+1-510-984-3600 >
Received on Friday, 18 September 2015 15:38:28 UTC