Re: Can Shapes always be Classes? from Dimitris Kontokostas on 2014-11-19 (public-data-shapes-wg@w3.org from November 2014)

From: Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de>
Date: Wed, 19 Nov 2014 11:05:19 +0200
To: "Eric Prud'hommeaux" <eric@w3.org>
Cc: Holger Knublauch <holger@topquadrant.com>, public-data-shapes-wg <public-data-shapes-wg@w3.org>
Message-ID: <CA+u4+a2x2TThRv0bxM=WYaUMHVGi2ui3buc+eiyo8Oh5Qkdp3w@mail.gmail.com>
On Wed, Nov 19, 2014 at 1:47 AM, Eric Prud'hommeaux <eric@w3.org> wrote:

> * Holger Knublauch <holger@topquadrant.com> [2014-11-06 09:38+1000]
> > I think it's encouraging to read suggestions on how we could merge
> > ideas from the various proposals, e.g. extend SPIN to make the
> > scenario below easier to represent. This is always a possibility.
> >
> > Thanks for providing a specific example, which makes our discussion
> > more focused. I do believe that the example below can be expressed
> > with the existing SPIN spec via something like
> >
> > :Issue
> >     spin:constraint [
> >         a sp:Ask ;
> >         sp:text """
> >             # The assignee must have an mbox
> >             ASK {
> >                 ?this :assignedTo ?assignee .
> >                 FILTER NOT EXIST { ?assignee foaf:mbox ?any }
>
> multiplied out for cardinality over :submittedBy:{given,family},
> status=unassigned | (status=assigned &&
> assignedTo/{givenName,familyName,mbox}), etc. gave me the 107 lines of
> SPARQL at the bottom of this message.
>
>
I'd argue to that. Since we define multiple Shapes, why do we have to
generate a single huge SPARQL query?
In RDFUnit we take a different approach and instead of a single SPARQL
query, we decompose the RS constraints to multiple ones. We call it
TestCase and in our case it's not a valid SPARQL query in definition like
SPIN but translates to one on execution. See below for details


>
> >             }
> >             """
> >     ]
> >
> > Basically this associates the constraint to the starting point and
> > uses a path to walk into the Person.
> >
> > The question then becomes how acceptable is that solution compared
> > to having to introduce a special mechanism and change the whole
> > execution engine, introduce the notion of starting nodes etc. I
> > would say the case you describe is quite rare and therefore people
> > should be able to live with the little inconvenience.
>
> Any time you see a restriction in OWL you have an example of a
> contextual constraint. OWL literature pretty much indoctrinates for
> constraining general predicates for us in particular classes, e.g.
> the pizza tutorial's :hasTopping. I've used many nested property
> restrictions in the projects that I've worked on.
>
>
> >                                                       The benefit of
> > the SPIN solution above is consistency, and users just need to
> > understand the simple principle of "object-oriented attachment" vs
> > context-sensitive execution and regular expressions.
> >
> > Furthermore, if the above is a recurring pattern, then it could be
> > generalized into a SPIN template, producing a definition such as
> >
> > :Issue
> >     spin:constraint [
> >         a ex:RequiredPropertyInContext ;
> >         arg:contextProperty :assignedTo ;
> >         arg:requiredProperty foaf:mbox
> >     ]
>
> Note that Resource Shapes does that but keeps the nested constraints
> in a separate shape:
>
> [[
> <http://ex.example/x#NewIssueShape> a rs:ResourceShape ;
> ...
>     rs:property [
>         rs:name "submittedBy" ;
>         rs:propertyDefinition :submittedBy ;
>         rs:valueShape x:SubmitterShape ;
>         rs:occurs rs:Exactly-one ;
>     ] ;
> ...
>
> <http://ex.example/x#SubmitterShape> a rs:ResourceShape ;
>     rs:property [
>         rs:name "givenName" ;
>         rs:propertyDefinition foaf:givenName ;
>         rs:valueType shex:Literal ;
>         rs:occurs rs:Exactly-one ;
>     ] ;
>     rs:property [
>         rs:name "familyName" ;
>         rs:propertyDefinition foaf:familyName ;
>         rs:valueType shex:Literal ;
>         rs:occurs rs:Zero-or-one ;
>     ] ;
>  .
> ]]
>

At the moment RDFUnit support only Shapes that define oslc:describes (typed
Shapes) but in that case our approach can follow both oslc:range
& oslc:valueType on different Shape definitions.

Assuming
<http://ex.example/x#N <http://ex.example/x#SubmitterShape>ewIssueShape>
oslc:describes  x:NewIssue
<http://ex.example/x#SubmitterShape> oslc:describes  x:Submitter
we will generate the following tests
- :submittedBy must occur once in x:NewIssue
- :submittedBy range must be x;Submitter
- :givenName must occur once in x:Submitter
- :givenName must be a literal
- :familyName must occur once in x:Submitter
- :familyName must be a literal

This has the advantage to provide more granular results to the end user.
At the moment, this approach does not work with ShEx's OR ( '|' ) semantics
but could be if we nest the tests into a logical graph

We all agree that Shapes must translate to SPARQL but just that is not
enough.
Even a few Shapes could easily produce several hundred lines of SPARQL.
This query can easily run on a small in-memory graph but would probably
never return on an endpoint with a few million triples.
Anchoring Shapes to classes is a means to easily decompose the constraints,
not convinced if untyped Shapes can achieve that or can validate a SPARQL
Endpoint directly.

This is not to say that I am against untyped Shapes but in the end we
should make clear the limitations of each choice.

Best,
Dimtiris


>
> > Holger
> >
> >
> > On 11/6/2014 4:25, Eric Prud'hommeaux wrote:
> > >* Holger Knublauch <holger@topquadrant.com> [2014-11-05 15:35+1000]
> > >>On 11/5/2014 15:26, Irene Polikoff wrote:
> > >>>>From: Holger Knublauch [mailto:holger@topquadrant.com]
> > >>>>Sent: Wednesday, November 05, 2014 12:16 AM
> > >>>>To: public-data-shapes-wg@w3.org
> > >>>>Subject: Can Shapes always be Classes?
> > >>>>
> > >>>>I believe there is a fundamental difference in how the various
> proposals
> > >>>>treat the relationship between resources and their shapes:
> > >>>>
> > >>>>- In OWL and SPIN, constraints are attached to classes. rdf:type
> triples are
> > >>>>used to determine which constraints need to be evaluated for a given
> > >>>>instance.
> > >>>>
> > >>>>- In the original Resource Shapes and ShEx, Shapes are stand-alone
> entities
> > >>>>that may or may not be associated with a class. Other mechanisms than
> > >>>>rdf:type are used to point from instances to their shapes.
> > >>>>
> > >>>>I believe the main motivation for the latter design are the User
> Stories
> > >>>>S7 and S8: different shapes at different times, and properties can
> change as
> > >>>>they pass through the workflow. I would like to learn more about
> this and
> > >>>>have specific examples that we can evaluate.
> > >>>>
> > >>>>My current assumption is that these scenarios can be expressed via
> named
> > >>>>graphs, so that different class definitions are used in different
> contexts.
> > >>>>Which graph to use would be specified in some kind of header
> metadata or via
> > >>>>a special property (e.g. owl:imports). Alternatively, different
> classes
> > >>>>could be used, just like different shapes are used depending on the
> context.
> > >>>>I argue that using rdf:type and RDFS classes is a well-established
> mechanism
> > >>>>that we should try to build upon. What problems do the proponents of
> the
> > >>>>decoupling see with those ideas?
> > >I think the fundamental issue is whether these are effectively
> > >context-sensitive grammars. As currently proposed, SPIN depends on
> > >type annotations attached to the data. It would be possible to add a
> > >step which creates a premise when validating some node. I believe this
> > >would get around all of the issues stemming from requiring fully
> > >discriminating types.
> > >
> > >Use Case: context-sensitive-rooted-issue-interface
> > >
> > >An LDP service accepts new Issues. A posted issue is expected to have
> > >a :name, :status and a :submittedBy. If the status is :assigned, it
> > >must have an :assignedTo . It may also have references to other Issues
> > >which may or may not be in the system so they are referenced by :name.
> > >
> > >Sample Data:
> > >   _:IssueA a :Issue ;
> > >     :name        "funny smell and no lights" ;
> > >     :status      :assigned ;
> > >     :submittedBy _:Bob ;
> > >     :assignedTo  _:Bob ;
> > >     :related     [ a :Issue ; :name "smoke coming from unit" ],
> > >                  [ a :Issue ; :name "110V capacitor in French unit" ] .
> > >
> > >   _:Bob    a foaf:Person ;
> > >     foaf:givenName "Bob" ;
> > >     foaf:familyName "Smith" ;
> > >     foaf:mbox    <mailto:bob@example.com> .
> > >
> > >There are multiple nodes of type :Issue so the client can specify the
> > >start node as _:IssueA (e.g. in a header). This makes the posted data
> > >a "pointed graph".
> > >
> > >If the requirements for the :submittedBy and the :assignedTo are
> > >different, we have need context-sensitivity.
> > >
> > >   x:NewIssueShape {
> > >     :name LITERAL,
> > >     :submittedBy @x:SubmitterShape,
> > >     (:status (:unassigned :unknown)
> > >      | :status (:assigned),
> > >        :assignedTo @x:AssigneeShape),
> > >     :related { :name LITERAL }*
> > >   }
> > >
> > >   x:SubmitterShape {
> > >     foaf:givenName LITERAL,
> > >     foaf:familyName LITERAL?
> > >   }
> > >
> > >   x:AssigneeShape {
> > >     foaf:givenName LITERAL,
> > >     foaf:familyName LITERAL,
> > >     foaf:mbox IRI
> > >   }
> > >
> > >If we have some OWL like (eliding cardinalities):
> > >
> > >   Class: x:NewIssueShape
> > >     SubClassOf:
> > >       :name some rdfs:Literal,
> > >       :submittedBy some x:SubmitterShape,
> > >       (:status value :unassigned
> > >        or
> > >        (:status value :assigned and :assignedTo some x:AssigneeShape))
> > >       :related (:name rdfs:Literal)
> > >
> > >   Class: x:SubmitterShape
> > >     SubClassOf:
> > >       foaf:givenName some rdfs:Literal,
> > >       foaf:familyName some rdfs:Literal
> > >
> > >   Class: x:AssigneeShape
> > >     SubClassOf:
> > >       foaf:givenName some rdfs:Literal,
> > >       foaf:familyName some rdfs:Literal,
> > >       foaf:mboxName some rdf:Resource
> > >
> > >, we can validate the data with the premise _:IssueA a x:NewIssueShape.
> > >The validation will recursively test the nested constraints. This of
> > >course hinges on being able to verify a premise.
> > >
> > >It seems reasonable to extend SPIN to test premises. It could use the
> > >same idea where instead of an rdfs:range to specify an expected object
> > >type, one could use Resource Shapes' rs:valueShape. This would assert
> > >the premise that e.g. _:Bob a x:SubmitterShape and then another
> > >_:Bob a x:AssigneeShape .
> > >
> > >
> > >>>>I think this is a major design decision that we need to clarify
> early.
> > >>>>Instead of excluding those scenarios, I would like to accommodate
> them
> > >>>>without having to introduce completely new mechanisms.
> > >>>>
> > >>>Holger,
> > >>>
> > >>>I believe you are saying that there could be two (or more) named
> graphs each
> > >>>containing different sets of constraints for a particular classes (or
> > >>>classes). For example:
> > >>>
> > >>>Graph A: contains rdf:type statements for a set of classes and
> properties.
> > >>>Can also contain other RDFS or OWL axioms
> > >>>
> > >>>Graph B: contains some constraints for classes declared in Graph A
> > >>>
> > >>>Graph C: contains a different set of constraints for classes declared
> in
> > >>>Graph A
> > >>>
> > >>>And so on
> > >>>
> > >>>A given application can then chose what set of constraints it will be
> using
> > >>>- Graph B or Graph C.
> > >>>
> > >>>Is this correct?
> > >>Yes sorry I was brief. Let's take an extreme use case, where the
> > >>same ex:Instance must fulfill different constraints in scenario A
> > >>and B.
> > >>
> > >>     ex:Instance
> > >>         a ex:Person ;
> > >>         foaf:firstName "John" .
> > >>
> > >>Scenario A: Each Person can have any number of first names.
> > >>
> > >>Scenario B: Each Person must have exactly one first name.
> > >>
> > >>In scenario A, it would have
> > >>
> > >>     <instance graph> owl:imports <schema A>
> > >>
> > >>where <schema A> is simply the unconstrained class definition.
> > >>
> > >>     ex:Person a rdfs:Class .
> > >>
> > >>In scenario B, it would owl:import <schema B> which is
> > >>
> > >>     ex:Person a rdfs:Class ;
> > >>         constraint foaf:firstName exactly 1 .
> > >>
> > >>I hope this explains the named graph work-around.
> > >>
> > >>Holger
> >
> >
>
> [[
> PREFIX :<http://ex.example/#>
> PREFIX foaf:<http://foaf.example/#>
> PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
> PREFIX x:<http://ex.example/x#>
> PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
> ASK {
>     { SELECT ?http://ex.example/x#NewIssueShape {
>       ?http://ex.example/x#NewIssueShape :name ?o .
>     } GROUP BY ?http://ex.example/x#NewIssueShape HAVING (COUNT(*)=1)}
>     { SELECT ?http://ex.example/x#NewIssueShape {
>       ?http://ex.example/x#NewIssueShape :name ?o . FILTER (isLiteral(?o))
>     } GROUP BY ?http://ex.example/x#NewIssueShape HAVING (COUNT(*)=1)}
>     { SELECT ?http://ex.example/x#NewIssueShape (COUNT(*) AS ?
> http://ex.example/x#NewIssueShape_c0) {
>       ?http://ex.example/x#NewIssueShape :submittedBy ?o .
>     } GROUP BY ?http://ex.example/x#NewIssueShape HAVING (COUNT(*)=1)}
>     { SELECT ?http://ex.example/x#NewIssueShape {
>       ?http://ex.example/x#NewIssueShape :submittedBy ?o . FILTER
> ((isIRI(?o) || isBlank(?o)))
>     } GROUP BY ?http://ex.example/x#NewIssueShape HAVING (COUNT(*)=1)}
>     { SELECT ?http://ex.example/x#NewIssueShape (COUNT(*) AS ?
> http://ex.example/x#NewIssueShape_c1) {
>         { SELECT ?http://ex.example/x#NewIssueShape ?
> http://ex.example/x#SubmitterShape {
>           ?http://ex.example/x#NewIssueShape :submittedBy ?
> http://ex.example/x#SubmitterShape . FILTER (true && (isIRI(?
> http://ex.example/x#SubmitterShape) || isBlank(?
> http://ex.example/x#SubmitterShape)))
>         } }
>         { SELECT ?http://ex.example/x#SubmitterShape {
>           ?http://ex.example/x#SubmitterShape foaf:givenName ?o .
>         } GROUP BY ?http://ex.example/x#SubmitterShape HAVING
> (COUNT(*)=1)}
>         { SELECT ?http://ex.example/x#SubmitterShape {
>           ?http://ex.example/x#SubmitterShape foaf:givenName ?o . FILTER
> (isLiteral(?o))
>         } GROUP BY ?http://ex.example/x#SubmitterShape HAVING
> (COUNT(*)=1)}
>         { SELECT ?http://ex.example/x#SubmitterShape (COUNT(*) AS ?
> http://ex.example/x#SubmitterShape_c0) {
>           ?http://ex.example/x#SubmitterShape foaf:familyName ?o .
>         } GROUP BY ?http://ex.example/x#SubmitterShape HAVING
> (COUNT(*)<=1)}
>         { SELECT ?http://ex.example/x#SubmitterShape (COUNT(*) AS ?
> http://ex.example/x#SubmitterShape_c1) {
>           ?http://ex.example/x#SubmitterShape foaf:familyName ?o . FILTER
> (isLiteral(?o))
>         } GROUP BY ?http://ex.example/x#SubmitterShape HAVING
> (COUNT(*)<=1)}
>         FILTER (?http://ex.example/x#SubmitterShape_c0 = ?
> http://ex.example/x#SubmitterShape_c1)
>     } GROUP BY ?http://ex.example/x#NewIssueShape }
>     FILTER (?http://ex.example/x#NewIssueShape_c0 = ?
> http://ex.example/x#NewIssueShape_c1)
>     OPTIONAL { ?http://ex.example/x#NewIssueShape :submittedBy ?
> http://ex.example/x#NewIssueShape_http://ex.example/x#SubmitterShape_ref0
> . FILTER (true && (isIRI(?
> http://ex.example/x#NewIssueShape_http://ex.example/x#SubmitterShape_ref0)
> || isBlank(?
> http://ex.example/x#NewIssueShape_http://ex.example/x#SubmitterShape_ref0)))
> }
>     { SELECT ?http://ex.example/x#NewIssueShape WHERE {
>         {
>             { SELECT ?http://ex.example/x#NewIssueShape {
>               ?http://ex.example/x#NewIssueShape :status ?o .
>             } GROUP BY ?http://ex.example/x#NewIssueShape HAVING
> (COUNT(*)=1)}
>             { SELECT ?http://ex.example/x#NewIssueShape {
>               ?http://ex.example/x#NewIssueShape :status ?o . FILTER ((?o
> = :unassigned || ?o = :unknown))
>             } GROUP BY ?http://ex.example/x#NewIssueShape HAVING
> (COUNT(*)=1)}
>         } UNION {
>             { SELECT ?http://ex.example/x#NewIssueShape {
>               ?http://ex.example/x#NewIssueShape :status ?o .
>             } GROUP BY ?http://ex.example/x#NewIssueShape HAVING
> (COUNT(*)=1)}
>             { SELECT ?http://ex.example/x#NewIssueShape {
>               ?http://ex.example/x#NewIssueShape :status ?o . FILTER ((?o
> = :assigned))
>             } GROUP BY ?http://ex.example/x#NewIssueShape HAVING
> (COUNT(*)=1)}
>             { SELECT ?http://ex.example/x#NewIssueShape (COUNT(*) AS ?
> http://ex.example/x#NewIssueShape_c2) {
>               ?http://ex.example/x#NewIssueShape :assignedTo ?o .
>             } GROUP BY ?http://ex.example/x#NewIssueShape HAVING
> (COUNT(*)=1)}
>             { SELECT ?http://ex.example/x#NewIssueShape {
>               ?http://ex.example/x#NewIssueShape :assignedTo ?o . FILTER
> ((isIRI(?o) || isBlank(?o)))
>             } GROUP BY ?http://ex.example/x#NewIssueShape HAVING
> (COUNT(*)=1)}
>             { SELECT ?http://ex.example/x#NewIssueShape (COUNT(*) AS ?
> http://ex.example/x#NewIssueShape_c3) {
>                 { SELECT ?http://ex.example/x#NewIssueShape ?
> http://ex.example/x#AssigneeShape {
>                   ?http://ex.example/x#NewIssueShape :assignedTo ?
> http://ex.example/x#AssigneeShape . FILTER (true && (isIRI(?
> http://ex.example/x#AssigneeShape) || isBlank(?
> http://ex.example/x#AssigneeShape)))
>                 } }
>                 { SELECT ?http://ex.example/x#AssigneeShape {
>                   ?http://ex.example/x#AssigneeShape foaf:givenName ?o .
>                 } GROUP BY ?http://ex.example/x#AssigneeShape HAVING
> (COUNT(*)=1)}
>                 { SELECT ?http://ex.example/x#AssigneeShape {
>                   ?http://ex.example/x#AssigneeShape foaf:givenName ?o .
> FILTER (isLiteral(?o))
>                 } GROUP BY ?http://ex.example/x#AssigneeShape HAVING
> (COUNT(*)=1)}
>                 { SELECT ?http://ex.example/x#AssigneeShape {
>                   ?http://ex.example/x#AssigneeShape foaf:familyName ?o .
>                 } GROUP BY ?http://ex.example/x#AssigneeShape HAVING
> (COUNT(*)=1)}
>                 { SELECT ?http://ex.example/x#AssigneeShape {
>                   ?http://ex.example/x#AssigneeShape foaf:familyName ?o .
> FILTER (isLiteral(?o))
>                 } GROUP BY ?http://ex.example/x#AssigneeShape HAVING
> (COUNT(*)=1)}
>                 { SELECT ?http://ex.example/x#AssigneeShape {
>                   ?http://ex.example/x#AssigneeShape foaf:mbox ?o .
>                 } GROUP BY ?http://ex.example/x#AssigneeShape HAVING
> (COUNT(*)=1)}
>                 { SELECT ?http://ex.example/x#AssigneeShape {
>                   ?http://ex.example/x#AssigneeShape foaf:mbox ?o .
> FILTER (isIRI(?o))
>                 } GROUP BY ?http://ex.example/x#AssigneeShape HAVING
> (COUNT(*)=1)}
>             } GROUP BY ?http://ex.example/x#NewIssueShape }
>             FILTER (?http://ex.example/x#NewIssueShape_c2 = ?
> http://ex.example/x#NewIssueShape_c3)
>             OPTIONAL { ?http://ex.example/x#NewIssueShape :assignedTo ?
> http://ex.example/x#NewIssueShape_http://ex.example/x#AssigneeShape_ref0
> . FILTER (true && (isIRI(?
> http://ex.example/x#NewIssueShape_http://ex.example/x#AssigneeShape_ref0)
> || isBlank(?
> http://ex.example/x#NewIssueShape_http://ex.example/x#AssigneeShape_ref0)))
> }
>         }
>     } GROUP BY ?http://ex.example/x#NewIssueShape HAVING (COUNT(*) = 1)}
>     { SELECT ?http://ex.example/x#NewIssueShape (COUNT(*) AS ?
> http://ex.example/x#NewIssueShape_c4) {
>       ?http://ex.example/x#NewIssueShape :related ?o .
>     } GROUP BY ?http://ex.example/x#NewIssueShape}
>     { SELECT ?http://ex.example/x#NewIssueShape (COUNT(*) AS ?
> http://ex.example/x#NewIssueShape_c5) {
>       ?http://ex.example/x#NewIssueShape :related ?o . FILTER ((isIRI(?o)
> || isBlank(?o)))
>     } GROUP BY ?http://ex.example/x#NewIssueShape}
>     FILTER (?http://ex.example/x#NewIssueShape_c4 = ?
> http://ex.example/x#NewIssueShape_c5)
>     { SELECT ?http://ex.example/x#NewIssueShape (COUNT(*) AS ?
> http://ex.example/x#NewIssueShape_c6) {
>         { SELECT ?http://ex.example/x#NewIssueShape ?4 {
>           ?http://ex.example/x#NewIssueShape :related ?4 . FILTER (true
> && (isIRI(?4) || isBlank(?4)))
>         } }
>         { SELECT ?4 {
>           ?4 :name ?o .
>         } GROUP BY ?4 HAVING (COUNT(*)=1)}
>         { SELECT ?4 {
>           ?4 :name ?o . FILTER (isLiteral(?o))
>         } GROUP BY ?4 HAVING (COUNT(*)=1)}
>     } GROUP BY ?http://ex.example/x#NewIssueShape }
>     FILTER (?http://ex.example/x#NewIssueShape_c4 = ?
> http://ex.example/x#NewIssueShape_c6)
>     OPTIONAL { ?http://ex.example/x#NewIssueShape :related ?
> http://ex.example/x#NewIssueShape_4_ref0 . FILTER (true && (isIRI(?
> http://ex.example/x#NewIssueShape_4_ref0) || isBlank(?
> http://ex.example/x#NewIssueShape_4_ref0))) }
> }
> ]]
>
> --
> -ericP
>
> office: +1.617.599.3509
> mobile: +33.6.80.80.35.59
>
> (eric@w3.org)
> Feel free to forward this message to any list for any purpose other than
> email address distribution.
>
> There are subtle nuances encoded in font variation and clever layout
> which can only be seen by printing this message on high-clay paper.
>
>


-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org
Homepage:http://aksw.org/DimitrisKontokostas
Received on Wednesday, 19 November 2014 09:06:18 UTC