Re: propose to make repeated-properties additive

On 9/19/15 1:26 AM, Holger Knublauch wrote:
> This sounds like a very fundamental change to the language. I believe
> this topic should have been raised half a year ago. Could you create a
> branch of the spec and work out how all this can be implemented?

Before we discuss how it changes (or does not change) the language of 
SHACL, I think we should talk about it as a use case and see if we can 
come to some understanding. I see two issues that affect usability:

1) default behavior (open or closed)
I don't find this described as such in the document, and it has to be 
intuited from the inclusion of closed shapes toward the end of the core 
section. It may be obvious to some who are familiar with the 
implementation or have extensive experience with SPARQL, but from the 
view of the first-time reader, this isn't visible, and it needs to be 
made clear early on what the behavior is. None of the examples show 
valid and invalid instances that would clarify this. (I have done a few 
proposed changes to the examples that I have sent to Holger and Arthur, 
and could include valid and invalid instance data another aspect of those.)

That said, either default makes sense to me as long as it is patently 
clear to users what to expect and easy to understand the results of 
choosing one or the other.

2) repeated properties
This is a real and not uncommon example:

   <bf_Person1>
   bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ;
     #IRI from id.loc.gov, min 1, max 1
   bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> .
     #IRI from viaf.org, min 1, max unlimited

Other examples could easily be developed using SKOS (which, as I pointed 
out earlier, is one of the three most used vocabularies in LoD) and 
Dublin Core (#1). If the analysis provided by Eric is not correct then 
it would be important to hear that. If it is, then we need to talk about 
how to handle repeated properties with different value constraints.

As for bringing this up late, I have tried to provide use cases where I 
thought they were needed, but had no way to know that this case, which I 
admit I take for granted, would be needed. My fear is that there are 
other such assumptions in the code that supports SHACL that are not 
obvious from the reading, and that these will appear when we begin 
looking at test cases (which will be even later). I can provide any 
amount of instance data for testing, but, I'm not at all sure that I'll 
be able to find someone in my community who can write SHACL requirements 
for those tests. I'll do what I can, however. (My other fear is that 
we'll test what SHACL *does* rather than what we want it to do.)

kc




>
> How would it work with rdfs:subClassOf, if a shape attached to a
> subclass narrows down the sh:valueClass of a property defined for the
> superclass?
>
> What would be the meaning of sh:hasValue combined with minCount/maxCount?
>
> How would this work with constraint properties added via the extension
> mechanism (sub-templates of sh:PropertyConstraint such as
> sh:uniqueLanguage=true)?
>
> Also note that the proposed changes would be very surprising for people
> with an OWL background, where multiple occurrence (of owl:Restrictions)
> behaves in the same way as in SHACL. I personally find the
> interpretation that you propose very unintuitive and complicating.
>
> Holger
>
>
> On 9/18/15 10:53 PM, Eric Prud'hommeaux wrote:
>> I have been working with Karen Coyle and Tom Baker of Dublin Core on
>> the following critique of the drafted behavior of repeated properties
>> in SHACL. We discussed usability issues and lessons learned with
>> respect to Description Set Profile semantics.
>>
>>
>>
>> In 2008, DCMI did an analysis evaluating the Scholarly Works
>> Application Profile (SWAP DSP), a deliverable of a UK project led by
>> UKOLN, for conformance with DCMI's then-current model for DSPs. They
>> tested whether the DSPs written by modelers matched their intended
>> semantics.
>>
>> The DSPs failed to behave as modelers' expected; modelers were using
>> generic properties like dc:type multiple times with the expectation
>> that each constraint would correspond to one triple in the graph. An
>> example of this is the resuse of dc:type within a description of an
>> expression of an Eprint (it's a library thing). The SWAP DSP for
>> this included two dc:type arcs with values of <Expression>¹ and
>> <JournalArticle>².
>>
>> ¹
>> http://www.ukoln.ac.uk/repositories/digirep/index/Scholarly_Works_Application_Profile#Entity_type_2
>>
>> ²
>> http://www.ukoln.ac.uk/repositories/digirep/index/Scholarly_Works_Application_Profile#Type
>>
>>
>>
>> The SWAP DSP was found not to conform to the guidelines, because the
>> guidelines specified a matching algorithm whereby each statement in
>> the data was assumed to match just one statement template with a
>> given property constraint. In SWAP, a single property (dc:type) was
>> used in two different templates for statements describing the same
>> resource, ¹ and ² above.
>>
>> At the time, DCMI saw this result as a problem more of the matching
>> algorithm than of the SWAP DSP itself. It is not all that uncommon
>> in the community of DC users for a property to be used with
>> different constraints. To assume otherwise, at any rate, seemed
>> needlessly restrictive.
>>
>> Repeated properties are common in Cultural Heritage data, one of the
>> communities served by DCMI, and these communities are actively
>> converting their models to make use of RDF. One of the more
>> promising models is coming from the Library of Congress, and is
>> called BIBFRAME. First, consider an example from Bibframe where each
>> bf:Person must have only one bf:identifiedBy and it must come from
>> id.loc.gov:
>>
>>    Instance data example:
>>    <bf_Person1>
>>      bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961> .
>>
>>    ShEx:
>>      <BFPersonInterface1> {
>>        bf:identifiedBy IRI PATTERN "^http://id.loc.gov/"
>>      }
>>
>>    SHACL:
>>      <BFPersonInterface1> sh:property [
>>        sh:predicate bf:identifiedBy ; sh:pattern "^http://id.loc.gov/"
>>      ] .
>>
>>
>> Another option in BIBFRAME is to have one or more additional
>> bf:identifiedBys coming from another source list, such as viaf.org:
>>
>>    Instance data example:
>>    <bf_Person1>
>>    bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ;
>>    bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> .
>>
>>    ShEx:
>>      <BFPersonInterface1> {
>>        bf:identifiedBy IRI PATTERN "^http://id.loc.gov/" ,
>>        bf:identifiedBy IRI PATTERN "^http://viaf.org/"  +
>>      }
>>
>> With SHACL, users have to remember to use "Qualified" counts and
>> because SHACL defaults to an open graph, the constraints below do
>> not prohibit additional bf:identifiedBy properties:
>>
>>    SHACL:
>>      <BFPersonInterface1> sh:property [
>>          sh:predicate bf:identifiedBy ; sh:pattern
>> "^http://id.loc.gov/" ;
>>          sh:minQualifiedCount 1 ; sh:maxQualifiedCount 1
>>        ], [
>>          sh:predicate bf:identifiedBy ; sh:pattern "^http://viaf.org/" ;
>>          sh:minQualifiedCount 1
>>        ] .
>>
>> ... would return true for the instance with two bf:identifiedBy
>> predicates, but would also erroneously match
>>
>> <bf_Person1>
>>    bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ;
>>    bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> ;
>>    bf:identifiedBy "this is a mistake" . # should be an error
>>
>> If we invent a new construct to handle this, say
>>        [
>>          sh:predicate bf:identifiedBy ; sh:patterns
>>            ("^http://viaf.org/" "^http://viaf.org/");
>>          sh:minQualifiedCount 1
>>        ]
>> , the cost of repeated properties is painfully high.
>>
>> =PROPOSAL=
>>
>> There are plenty of use cases for repeated properties. We propose that
>> the syntax for repeated property constraints be identical to that for
>> single property constraints, i.e. that
>>
>>    <BFPersonInterface1> sh:property [
>>        sh:predicate bf:identifiedBy ; sh:pattern "^http://id.loc.gov/" ;
>>        sh:minCount 1 ; sh:maxCount 1
>>      ], [
>>        sh:predicate bf:identifiedBy ; sh:pattern "^http://viaf.org/" ;
>>        sh:minCount 1
>>      ] .
>>
>> is matched by any node with arcs that satisfy each of the property
>> requirements.
>>
>> pass:
>>    <bf_Person1>
>>      bf:identifiedBy
>> <http://id.loc.gov/authorities/names/n80103961#RWO> ;
>>      bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> .
>>
>>    <bf_Person1>
>>      bf:identifiedBy
>> <http://id.loc.gov/authorities/names/n80103961#RWO> ;
>>      bf:identifiedBy
>> <https://viaf.org/viaf/268367832/#Knape,_Joachim,T> .
>>
>> fail:
>>    <bf_Person1>   # missing id.loc.gov id
>>      bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> .
>>
>>    <bf_Person1>   # unrecognized identifiedBy property
>>      bf:identifiedBy
>> <http://id.loc.gov/authorities/names/n80103961#RWO> ;
>>      bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> ;
>>      bf:identifiedBy "this is a mistake" .
>>
>>    <bf_Person1>   # too many id.loc.gov ids
>>      bf:identifiedBy
>> <http://id.loc.gov/authorities/names/n80103961#RWO> ;
>>      bf:identifiedBy
>> <http://id.loc.gov/authorities/names/n80103961#RWOXT> ;
>>      bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> .
>>
>>> --
>>> Karen Coyle
>>> kcoyle@kcoyle.net http://kcoyle.net
>>> m: 1-510-435-8234
>>> skype: kcoylenet/+1-510-984-3600
>
>
>

-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
m: 1-510-435-8234
skype: kcoylenet/+1-510-984-3600

Received on Saturday, 19 September 2015 07:03:12 UTC