Re: scopeNode and scopeClass from Irene Polikoff on 2016-06-13 (public-data-shapes-wg@w3.org from June 2016)

From: Irene Polikoff <irene@topquadrant.com>
Date: Mon, 13 Jun 2016 07:14:34 -0400
To: kcoyle@kcoyle.net
Cc: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de>, public-data-shapes-wg <public-data-shapes-wg@w3.org>
Message-Id: <8560D1B3-7A28-4A14-86FE-8A2B1F0A3F2E@topquadrant.com>
Firstly, maxCount today can be used with either a node scope or a class scope and it behaves consistently in both cases. Further, its meaning is exactly the same as in all other data modeling systems - be it XML Schema, UML, RDBMS or even OWL (in situations when the OWA doesn't change the outcome). So, it should be quite intuitive for the users. I would be confused if the meaning of maxCount would differ depending on its context of use.

Secondly, no matter what syntax is used, user will have to understand the semantics of shapes including what are the expected results and what shapes are irrelevant (can never be satisfied) in order to effectively use shapes to solve their problems. 

Personally, I strongly believe users would benefit from and prefer being able to check their shapes at design time and be told if a shape has no practical application, they constructed it by mistake and, instead, they need something else.

Irene 

Sent from my iPhone

> On Jun 12, 2016, at 6:01 PM, Karen Coyle <kcoyle@kcoyle.net> wrote:
> 
> I definitely think that this approach will be much easier for users, rather than having different types of scopes and constraints with complex relationships between which constraints can be applied to which scopes.
> 
> kc
> 
>> On 6/12/16 2:36 PM, Peter F. Patel-Schneider wrote:
>> A way to do limit cardinalities on classes is as in my refactored SHACL,
>> described at https://www.w3.org/2014/data-shapes/wiki/Refactor
>> 
>> In this refactoring of SHACL there is no need for node constraints, property
>> constraints, inverse property constraints, path constraints, or even
>> constraints at all.  There are only shapes and shapes all work on a set of
>> nodes - which usually play the role of value nodes but sometimes act more like
>> focus nodes.
>> 
>> Many constructs, e.g., sh:class, sh:datatype, sh:minLength, sh:minExclusive,
>> and sh:shape, just look at each node in this set independently and act just
>> the same as in the current setup, albeit with a change in terminology.  Other
>> constructs consider the set as a whole - sh:maxCount limits the cardinality of
>> the set, sh:hasValue checks whether a particular node is in the set,
>> sh:uniqueLang checks for duplicate language tags in the set.
>> 
>> One benefit of this refactor is that the analogue of constraint components are
>> not responsible for determining what to do with the sh:predicate argument -
>> there is no sh:predicate argument.  Instead, this determination is confined to
>> a construct that takes the place of sh:property and sh:inverseProperty.
>> 
>> Another benefit is that for shapes with scopes the initial set of nodes is the
>> in-scope nodes so constructs like sh:maxCount can be used on this set.
>> Requiring that the cardinality of the SHACL instances of ex:person is at most
>> two is done by
>> 
>> ex:ps a sh:Shape ;
>> sh:scopeClass ex:Person ;
>> sh:maxCount 2 .
>> 
>> peter
>> 
>> 
>> 
>>> On 06/12/2016 01:00 PM, Dimitris Kontokostas wrote:
>>> Hi Karen,
>>> 
>>> Irene is correct but here's a little more explanation.
>>> Currently we have 3+1 types of
>>> constraints: http://w3c.github.io/data-shapes/shacl/#dfn-constraint
>>> property, inverse property, node constraints + path constraints as a new addition
>>> 
>>> Node constraints define constraint about the focus node directly but taking
>>> only the focus node each time into account and do not apply on the whole set
>>> of focus nodes defined by e.g. a scopeClass
>>> 
>>> What you need to define would require a new type of constraint, something like
>>> a scopeConstraint that applies constraints on the whole set of focus nodes.
>>> This of course wouldn't be limited to only scopeClass but all the ways we can
>>> define a scope for a shape.
>>> 
>>> Peter's proposal is orthogonal to this, Peter suggests that each constraints
>>> must be defined for all contexts so, if we were to define a scopeConstraint,
>>> then we would have to define all core constraints for this context as well
>>> that is sh:minCount, maxCount, nodeKind, class, ...
>>> 
>>> On Sun, Jun 12, 2016 at 8:01 PM, Irene Polikoff <irene@topquadrant.com
>>> <mailto:irene@topquadrant.com>> wrote:
>>> 
>>>    maxCount works with a specified predicate. It counts a number of distinct
>>>    triples with the focus node as a subject and predicate provided in the
>>>    constraint. This is its semantics.
>>> 
>>>    maxCount can be used with the class scope, but this wouldn't give the
>>>    desired effect because if we use RDF:type as a predicate (and what else
>>>    would one use?), it would count all statements that have focus nodes as
>>>    subjects and type as a predicate, irrespective of what the object is. And
>>>    since a resource can be of multiple types, it is a problem.
>>> 
>>>    Peter's proposals make no difference here, as far as I can tell.
>>> 
>>>    Holger's solution is entirely consistent, but as Dmitris mentioned it
>>>    doesn't do transitive closure of subclasses.
>>> 
>>>    A constraint that would do this, would have a different semantics from
>>>    maxCount and, if we wanted to include it in the core, it should have a
>>>    different name.
>>> 
>>> 
>>> 
>>>    Sent from my iPhone
>>> 
>>>    > On Jun 12, 2016, at 12:30 PM, Karen Coyle <kcoyle@kcoyle.net
>>>    <mailto:kcoyle@kcoyle.net>> wrote:
>>>    >
>>>    > Dimitris, thanks, but I wasn't suggesting to use node scope or node
>>>    constraint. The example with node constraint is what Holger supplied. My
>>>    use case is related to classes. classScope is defined as: "A class scope
>>>    for class $scopeClass is defined as the set of all SHACL instances of
>>>    $scopeClass in the data graph." Key here is that it is *all SHACL
>>>    instances of $scopeClass*. However, the current spec does not allow one to
>>>    use minCount or maxCount with scopeClass. That is what I was questioning.
>>>    With Peter's proposal of having all constraints be usable with all scope
>>>    types, I believe that one should be able to use counts with classes in
>>>    scope. In any case, I think that the use case "n instances of classA" is a
>>>    legitimate use case, and should logically be coded as a classScope since a
>>>    user will think of it as relating to a class.
>>>    >
>>>    > Honestly, I think the spec is getting more obscure in its meaning, or
>>>    maybe it is just that the more I read it the less sense it makes.
>>>    >
>>>    > kc
>>>    >
>>>    >> On 6/12/16 6:45 AM, Dimitris Kontokostas wrote:
>>>    >> Hi Karen,
>>>    >>
>>>    >> I think there is some confusion here on the role of scope and node
>>>    >> constraints
>>>    >>
>>>    >> when we put the constraint in the node (nodeConstraints) the node
>>>    >> constraint does not apply on all the focus nodes together but on the
>>>    >> focus nodes one by one, as it does on the property constraints.
>>>    >>
>>>    >> let's take the example you had in mind
>>>    >> ex:MyShape
>>>    >>    a sh:Shape ;
>>>    >>    sh:scopeNode foaf:Person ;
>>>    >>    sh:constraint [
>>>    >>        sh:maxCount 2 ;
>>>    >>    ] .
>>>    >>
>>>    >> with data
>>>    >> ex:Bob a foaf:Person
>>>    >> ex:Alice a foaf:Person
>>>    >> ex:Carol a foaf:Person
>>>    >>
>>>    >> in this case the sh:maxCount argument is applied separately for ex:Bob,
>>>    >> ex:Alice and ex:Carol and it is always valid.
>>>    >> The reason this shape is always valid for any graph because each focus
>>>    >> node always has count = 1 when it is evaluated
>>>    >> (this goes with the discussion that some constraints does not make sense
>>>    >> in certain contexts)
>>>    >>
>>>    >> if you want to limit the number of Persons inside a data graph, Holgers
>>>    >> example would do the trick but doesn't take SHACL instances into
>>>    >> account, only direct types
>>>    >>
>>>    >> Dimitris
>>>    >>
>>>    >> On Sun, Jun 12, 2016 at 1:13 PM, Karen Coyle <kcoyle@kcoyle.net
>>>    <mailto:kcoyle@kcoyle.net>
>>>    >> <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>>> wrote:
>>>    >>
>>>    >>    Sorry my last note was abrupt - was in an airport and my flight was
>>>    >>    called. Here's my summary of this issue:
>>>    >>
>>>    >>    - min/maxCount are defined only for predicate-based constraints
>>>    >>    - maxCount "n" instances of classA cannot be a class-based constraint
>>>    >>    - the validation requirement "n instances of classA" must be done as:
>>>    >>
>>>    >>     ex:MyShape
>>>    >>         a sh:Shape ;
>>>    >>         sh:scopeNode foaf:Person ;
>>>    >>         sh:inverseProperty [
>>>    >>             sh:predicate rdf:type ;
>>>    >>             sh:maxCount 2 ;
>>>    >>         ] .
>>>    >>
>>>    >>    My response is that
>>>    >>
>>>    >>    1) this is inconsistent. If there is a need to count instances of a
>>>    >>    class it should be done with scopeClass as with all other
>>>    >>    class-based validation requirements
>>>    >>    2) this is not clear in the SHACL document. (I would be interested
>>>    >>    to hear from others on the list whether this was obvious to them.)
>>>    >>
>>>    >>    I don't think we should accept this kind of inconsistency in the
>>>    >>    standard.
>>>    >>
>>>    >>    kc
>>>    >>
>>>    >>
>>>    >>    On 6/9/16 10:10 PM, Holger Knublauch wrote:
>>>    >>
>>>    >>        Karen,
>>>    >>
>>>    >>        I am a bit lost about what you are referring to. This discussion
>>>    >>        started
>>>    >>        about using scopeNode to represent number of instances in a
>>>    >>        graph, and
>>>    >>        then we added scopeClass into the mix. So I assume you are probably
>>>    >>        referring to the examples in sections 2.1.1 and 2.1.2. The scopeNode
>>>    >>        example in 2.1.1 has nothing to do with rdf:type. For class-based
>>>    >>        scopes, 2.1.2 already has two different data graphs - the first
>>>    >>        one with
>>>    >>        Alice and Bob is showing "direct" instances, while the ex:Who a
>>>    >>        ex:Doctor example already demonstrates how rdfs:subClassOf
>>>    >>        triples are
>>>    >>        used.
>>>    >>
>>>    >>        Or are you looking for a mechanism to count instances of a given
>>>    >>        class
>>>    >>        within the data graph, also taking subclasses into consideration?
>>>    >>
>>>    >>        Could you clarify?
>>>    >>
>>>    >>        Holger
>>>    >>
>>>    >>        PS: I have changed the subject line of this email to reflect the
>>>    >>        drift
>>>    >>        in topic
>>>    >>
>>>    >>
>>>    >>        On 10/06/2016 14:08, Karen Coyle wrote:
>>>    >>
>>>    >>            It isn't a question of switching them. I find that the
>>>    >>            examples do not
>>>    >>            show the difference between class-defined nodes and
>>>    >>            predicate-defined
>>>    >>            node that uses rdf:type as the predicate. Since all
>>>    >>            rdf:type/s must be
>>>    >>            explicitly defined, these are either the same, or they are
>>>    >>            different,
>>>    >>            and if they are different, that needs to be made clear.
>>>    >>
>>>    >>            kc
>>>    >>
>>>    >>            On 6/9/16 9:26 AM, Irene Polikoff wrote:
>>>    >>
>>>    >>                Examples in the spec would not have the same result if
>>>    >>                scopeNode and
>>>    >>                scopeClass were switched. They look pretty clear to me
>>>    >>                as they all
>>>    >>                identify what focus nodes would be selected.
>>>    >>
>>>    >>                May be the following will help:
>>>    >>
>>>    >>                Let¹s say there is a graph like so:
>>>    >>
>>>    >>                ex:Person rdfs:label ŒPerson¹.
>>>    >>                ex:Person rdfs:label ŒHuman Being¹.
>>>    >>                ex:Alice rdf:type ex:Person.
>>>    >>                ex:Alice rdfs:label ŒAlice¹.
>>>    >>                ex:Alice rdfs:label ŒAlice Jones¹.
>>>    >>                ex:Bob rdf:type ex:Person.
>>>    >>                ex:Joe rdf:type ex:Person.
>>>    >>                ex:Joe rdfs;label ŒJoe¹.
>>>    >>
>>>    >>                And a shape
>>>    >>
>>>    >>                ex:Shape1
>>>    >>                     a sh:Shape ;
>>>    >>                     sh:scopeNode ex:Person ;
>>>    >>                     sh:property [
>>>    >>                         sh:predicate rdfs:label ;
>>>    >>                         sh:maxCount 1 ;
>>>    >>                     ] .
>>>    >>
>>>    >>
>>>    >>                The node in focus is ex:Person and there will be a
>>>    >>                violation because it
>>>    >>                has two labels. No other nodes are in focus, no other
>>>    >>                violations.
>>>    >>
>>>    >>                If there was a different shape
>>>    >>
>>>    >>                ex:Shape2
>>>    >>                a sh:Shape ;
>>>    >>                sh:scopeClass ex:Person ;
>>>    >>                sh:property [
>>>    >>                sh:predicate rdfs:label ;
>>>    >>                sh:maxCount 1 ;
>>>    >>                ] .
>>>    >>
>>>    >>
>>>    >>                Then, three nodes are in scope - ex:Alice, ex:Bob and
>>>    >>                ex:Joe. There will
>>>    >>                be one violation for ex:Alice.
>>>    >>
>>>    >>                Lets look at the shape Holger has below. This shape
>>>    >>                demonstrates, among
>>>    >>                other things, that the validation can look at triples
>>>    >>                with the focus
>>>    >>                nodes
>>>    >>                as objects by using sh:inverseProperty
>>>    >>
>>>    >>                ex:MyShape
>>>    >>                     a sh:Shape ;
>>>    >>                     sh:scopeNode ex:Person ;
>>>    >>                     sh:inverseProperty [
>>>    >>                         sh:predicate rdf:type ;
>>>    >>                         sh:maxCount 2 ;
>>>    >>                     ] .
>>>    >>
>>>    >>                The focus node is ex:Person and there will be a
>>>    >>                violation since there
>>>    >>                are
>>>    >>                three triples that follow the {?x rdf:type ex:Person}
>>>    >>                pattern.
>>>    >>
>>>    >>
>>>    >>                Irene
>>>    >>
>>>    >>
>>>    >>
>>>    >>                On 6/9/16, 1:33 AM, "Karen Coyle" <kcoyle@kcoyle.net
>>>    <mailto:kcoyle@kcoyle.net>
>>>    >>                <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>>>
>>>    wrote:
>>>    >>
>>>    >>                    Holger, that still doesn't explain what the
>>>    >>                    difference is. What is the
>>>    >>                    quality of a SHACL class that is different to a
>>>    >>                    triple with a predicate
>>>    >>                    of rdf:type? Are you saying that scopeClass
>>>    >>                    implies/allows subclass
>>>    >>                    relationships to be included? If so, that must be
>>>    >>                    said in the
>>>    >>                    specification, and it should be illustrated in the
>>>    >>                    examples. As it is,
>>>    >>                    the examples given would have the same result using
>>>    >>                    either predicate.
>>>    >>
>>>    >>                    Also, the section that introduces scopeNode does not
>>>    >>                    say that it
>>>    >>                    applies
>>>    >>                    only to the subject of a triple. If that is the
>>>    >>                    case, then it needs to
>>>    >>                    specify that.
>>>    >>
>>>    >>                    kc
>>>    >>
>>>    >>                    On 6/8/16 9:44 PM, Holger Knublauch wrote:
>>>    >>
>>>    >>
>>>    >>                        On 9/06/2016 14:40, Karen Coyle wrote:
>>>    >>
>>>    >>                            sh:scopeClass <foaf:Person> and sh:scopeNode
>>>    >>                            <foaf:Person> appear to
>>>    >>                            identify the same focus node(s) in the data
>>>    >>                            graph.
>>>    >>
>>>    >>
>>>    >>                        sh:scopeNode means "the (class) node itself".
>>>    >>                        sh:scopeClass means "all SHACL instances of the
>>>    >>                        class".
>>>    >>
>>>    >>                        So they do not identify the same focus nodes.
>>>    >>
>>>    >>                        Holger
>>>    >>
>>>    >>
>>>    >>
>>>    >>
>>>    >>                            ***shape1***
>>>    >>                            ex:MyShape
>>>    >>                                a sh:Shape ;
>>>    >>                                sh:scopeNode foaf:Person ;
>>>    >>                                sh:inverseProperty [
>>>    >>                                    sh:predicate rdf:type ;
>>>    >>                                    sh:maxCount 2 ;
>>>    >>                                ] .
>>>    >>
>>>    >>                            ***shape2***
>>>    >>                            ex:PersonShape
>>>    >>                                a sh:Shape ;
>>>    >>                                sh:scopeClass ex:Person .
>>>    >>
>>>    >>                            ***data graph***
>>>    >>
>>>    >>                            ex:Alice a ex:Person .
>>>    >>                            ex:Bob a ex:Person .
>>>    >>                            ex:NewYork a ex:Place .
>>>    >>
>>>    >>                            Where does the spec address the reason for this?
>>>    >>
>>>    >>                            kc
>>>    >>
>>>    >>                            On 6/7/16 10:09 PM, Holger Knublauch wrote:
>>>    >>
>>>    >>
>>>    >>
>>>    >>                                On 8/06/2016 14:57, Karen Coyle wrote:
>>>    >>
>>>    >>
>>>    >>
>>>    >>                                    On 6/7/16 7:38 PM, Holger Knublauch
>>>    >>                                    wrote:
>>>    >>
>>>    >>                                        Yes and SHACL should implement
>>>    >>                                        the same policy, because
>>>    >>                                        sh:maxCount
>>>    >>                                        also
>>>    >>                                        only makes sense for
>>>    >>                                        predicate-based constraints and
>>>    >>                                        not node
>>>    >>                                        constraints.
>>>    >>
>>>    >>
>>>    >>                                    Does this then rule out a constraint
>>>    >>                                    like "n things of type x"? For
>>>    >>                                    example, if you want to limit the
>>>    >>                                    number of foaf:Person nodes?
>>>    >>
>>>    >>
>>>    >>                                No. To express "A graph must have at
>>>    >>                                most 2 instances of
>>>    >>                                foaf:Person"
>>>    >>                                you would write
>>>    >>
>>>    >>                                ex:MyShape
>>>    >>                                    a sh:Shape ;
>>>    >>                                    sh:scopeNode foaf:Person ;
>>>    >>                                    sh:inverseProperty [
>>>    >>                                        sh:predicate rdf:type ;
>>>    >>                                        sh:maxCount 2 ;
>>>    >>                                    ] .
>>>    >>
>>>    >>                                In other words "there must be at most 2
>>>    >>                                triples that have
>>>    >>                                foaf:Person
>>>    >>                                as
>>>    >>                                object and rdf:type as predicate".
>>>    >>
>>>    >>                                Peter's suggested use of sh:maxCount at
>>>    >>                                node constraints would mean
>>>    >>
>>>    >>                                "Verify that the set of value nodes is
>>>    >>                                not larger than two. Oh, and
>>>    >>                                regardless of the actual data, I already
>>>    >>                                know that this set of value
>>>    >>                                nodes has size 1, because it always
>>>    >>                                consists of the focus node only.
>>>    >>                                So
>>>    >>                                actually I only need to test whether the
>>>    >>                                value of sh:maxCount > 0."
>>>    >>
>>>    >>                                which is a rather useless construct. You
>>>    >>                                have just confirmed that
>>>    >>                                misusing sh:maxCount as node constraints
>>>    >>                                will likely confuse users.
>>>    >>
>>>    >>                                Is this difference clearer now, or what
>>>    >>                                else could I clarify?
>>>    >>
>>>    >>                                Thanks,
>>>    >>                                Holger
>>>    >>
>>>    >>
>>>    >>
>>>    >>
>>>    >>
>>>    >>
>>>    >>
>>>    >>
>>>    >>                    --
>>>    >>                    Karen Coyle
>>>    >>                    kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>
>>>    <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>>
>>>    >>                    http://kcoyle.net
>>>    >>                    m: 1-510-435-8234
>>>    >>                    skype: kcoylenet/+1-510-984-3600
>>>    <tel:%2B1-510-984-3600> <tel:%2B1-510-984-3600>
>>>    >>
>>>    >>
>>>    >>
>>>    >>
>>>    >>
>>>    >>
>>>    >>
>>>    >>
>>>    >>
>>>    >>    --
>>>    >>    Karen Coyle
>>>    >>    kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>
>>>    <mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>> http://kcoyle.net
>>>    >>    m: 1-510-435-8234
>>>    >>    skype: kcoylenet/+1-510-984-3600 <tel:%2B1-510-984-3600>
>>>    <tel:%2B1-510-984-3600>
>>>    >>
>>>    >>
>>>    >>
>>>    >>
>>>    >> --
>>>    >> Dimitris Kontokostas
>>>    >> Department of Computer Science, University of Leipzig & DBpedia Association
>>>    >> Projects: http://dbpedia.org, http://rdfunit.aksw.org,
>>>    >> http://aligned-project.eu
>>>    >> Homepage: http://aksw.org/DimitrisKontokostas
>>>    >> Research Group: AKSW/KILT http://aksw.org/Groups/KILT
>>>    >>
>>>    >
>>>    > --
>>>    > Karen Coyle
>>>    > kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> http://kcoyle.net
>>>    > m: 1-510-435-8234
>>>    > skype: kcoylenet/+1-510-984-3600 <tel:%2B1-510-984-3600>
>>>    >
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Dimitris Kontokostas
>>> Department of Computer Science, University of Leipzig & DBpedia Association
>>> Projects: http://dbpedia.org, http://rdfunit.aksw.org, http://aligned-project.eu
>>> Homepage: http://aksw.org/DimitrisKontokostas
>>> Research Group: AKSW/KILT http://aksw.org/Groups/KILT
> 
> -- 
> Karen Coyle
> kcoyle@kcoyle.net http://kcoyle.net
> m: 1-510-435-8234
> skype: kcoylenet/+1-510-984-3600
Received on Monday, 13 June 2016 11:15:09 UTC