Re: propose to make repeated-properties additive from Karen Coyle on 2015-09-22 (public-data-shapes-wg@w3.org from September 2015)

From: Karen Coyle <kcoyle@kcoyle.net>
Date: Tue, 22 Sep 2015 07:03:11 +0200
To: public-data-shapes-wg@w3.org
Message-ID: <5600E10F.6010305@kcoyle.net>
On 9/21/15 2:50 AM, Holger Knublauch wrote:
> On 9/21/2015 6:19, Karen Coyle wrote:
>>
>>
>> On 9/20/15 1:39 AM, Holger Knublauch wrote:
>>> On 9/19/15 5:02 PM, Karen Coyle wrote:
>>>> 2) repeated properties
>>>> This is a real and not uncommon example:
>>>>
>>>>   <bf_Person1>
>>>>   bf:identifiedBy <http://id.loc.gov/authorities/names/n80103961#RWO> ;
>>>>     #IRI from id.loc.gov, min 1, max 1
>>>>   bf:identifiedBy <https://viaf.org/viaf/268367832/#Knape,_Joachim> .
>>>>     #IRI from viaf.org, min 1, max unlimited
>>>
>>> I agree it makes sense to talk about the requirements before requesting
>>> a change to the language. If I understand the intention correctly, then
>>> the above could be expressed with an incremental addition to the core
>>> library:
>>>
>>> ex:MyShape
>>>      a sh:Shape ;
>>>      sh:property [
>>>          sh:predicate bf:identifiedBy ;
>>>          sh:qualifiedMinCount 1 ;
>>>          sh:qualifiedMaxCount 1 ;
>>>          sh:qualifiedValueShape [
>>>              sh:constraint [
>>>                  a sh:URIPatternConstraint ;
>>>                  sh:uriPattern "^http://id.loc.gov" ;
>>>              ] ;
>>>          ] ;
>>>      ] ;
>>>      sh:property [
>>>          sh:predicate bf:identifiedBy ;
>>>          sh:qualifiedMinCount 1 ;
>>>          sh:qualifiedValueShape [
>>>              sh:constraint [
>>>                  a sh:URIPatternConstraint ;
>>>                  sh:uriPattern "^http://viaf.org" ;
>>>              ] ;
>>>          ] ;
>>>      ] .
>>>
>>> The new feature that would be needed would be sh:URIPatternConstraint -
>>> the current sh:pattern only applies to property values "one hop away"
>>> while here we would need something that talks about the IRI of the focus
>>> node itself. We had a similar topic recently with regards to
>>> sh:allowedValues. It may make sense to generalize the validation
>>> function mechanism so that the same infrastructure can be reused, but in
>>> the end this is about syntactic sugar only.
>>
>> In fact, the question was less about the URI pattern than about the
>> ability to have different values for the same property, and to
>> constrain them separately. So we should pick some other value
>> constraints to use that SHACL already handles -- perhaps that the
>> first instance is an IRI and the second is a literal.
>
> Ok, this just highlights that we need more test cases. A property that
> can (meaningfully) take either literals or IRIs as values is not a use
> case that I have seen yet. I am aware that schema.org allows that, but
> schema.org can afford that liberty because they have quite a lot of
> post-processing machinery, e.g. to turn a country code string into a
> Country instance.

Holger, whether they can "afford it" or not isn't relevant, I don't 
believe. The fact is that there is quite a lot of use of schema.org. I'm 
told that it is now used on about 1/3 of all web sites. The library 
world's main database, http://worldcat.org, has schema.org encoding for 
a major percentage of its over 300 million items, and that's just one 
database. And, as I say repeatedly, Dublin Core 1.1 is one of the most 
used vocabularies in LoD, and it does not define ranges for any of its 
properties, and usage (literal or IRI) varies.

I'm concerned that it will be difficult to explain the need to use the 
"sh:qualified*", and what the implications are of its use. I'm also 
concerned that readers of the spec will miss the distinction. My goal, 
therefore, is to clarify the use of "qualified" for users.

>
> Anyway, let's look at what it would take to express such things. I
> believe we need to continue to keep general property constraints and
> qualified property constraints separate, because the majority of
> constraints is not qualified but applies to all values.

I'm not at all sure that this is true (it's a gut feeling unless someone 
can produce some actual data), and I don't see this as a good reason to 
keep them separate. There may well be other reasons, but to me the key 
is that it be easy to understand -- for those people who aren't already 
fully immersed in OWL.

In order to
> express your use case above, we would need to extend the
> sh:qualifiedValueShape mechanism so that it can also work with literals.
> Currently my implicit assumption was that the focus node cannot be a
> literal, but this is probably an unnecessary restriction. I have just
> opened a ticket for that micro decision.

I don't know what I said that would indicate that a shape have a literal 
as subject -- is that what you mean here? That sounds like something 
that violates the basis of RDF, so I don't know why it would be needed. 
Did I misunderstand what you meant?

kc

>
> Then, we could more cleanly separate the concepts of property value
> restrictions, and restrictions on the focus node itself. Taking
> sh:nodeKind as an example, we could then describe your scenario using
>
> ex:MyShape
>      a sh:Shape ;
>      rdfs:comment "someProperty must have one IRI and one or more
> Literals" ;
>      sh:property [
>          sh:predicate ex:someProperty ;
>          sh:qualifiedMinCount 1 ;
>          sh:qualifiedMaxCount 1 ;
>          sh:qualifiedValueShape [
>              sh:constraint [
>                  sh:NodeConstraint ;
>                  sh:nodeKind sh:IRI ;
>              ]
>          ]
>      ] ;
>      sh:property [
>          sh:predicate ex:someProperty ;
>          sh:qualifiedMinCount 1 ;
>          sh:qualifiedValueShape [
>              sh:constraint [
>                  sh:NodeConstraint ;
>                  sh:nodeKind sh:Literal ;
>                  sh:datatype xsd:string ;
>              ]
>          ]
>      ] .
>



> In the design above I have am suggesting a template sh:NodeConstraint
> which combines the various node-related constraint types:
>
> - sh:allowedValues
> - sh:class
> - sh:datatype
> - sh:directType
> - sh:minLength
> - sh:maxLength
> - sh:nodeKind
> - sh:maxExclusive etc
> - sh:pattern
>
> All of these are backed by the same sh:ValidationFunctions as the
> property constraints, so it's just another syntax for the same thing.
> (But we would need to reopen the discussion on the naming of
> sh:valueClass vs. sh:class).
>
> Does this sound like the right direction?
>
>> Looking at the above, I think both would fail when the "other" value
>> is evaluated.
>>
>> Wouldn't this be a case that could use sh:filterShape? That is, there
>> would be a separate sh:filterShape for the two different cases. By
>> using the filter, only one value type would be included in the graph
>> being evaluated.
>
> I don't see how a filterShape would help here.
>
> Holger
>
>
>

-- 
Karen Coyle
kcoyle@kcoyle.net http://kcoyle.net
m: 1-510-435-8234
skype: kcoylenet/+1-510-984-3600
Received on Tuesday, 22 September 2015 05:03:45 UTC