Re: inheritance & reuse from Johnston, Patrick - Hoboken on 2016-01-17 (public-data-shapes-wg@w3.org from January 2016)

From: Johnston, Patrick - Hoboken <pjohnston@wiley.com>
Date: Sun, 17 Jan 2016 06:41:35 +0000
To: Holger Knublauch <holger@topquadrant.com>, "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
Message-ID: <9A68BFCD-0266-4F16-A53E-F794EC879F0B@wiley.com>
Thx for taking the time to respond, Holger. I accept your government health warning… I am still listed as a participant in the working group, though I haven’t been able to attend since the pretty early on (day job). Please feel free to ignore my questions if they become a distraction.

PJ> inline



From: Holger Knublauch <holger@topquadrant.com<mailto:holger@topquadrant.com>>
Date: Friday, January 15, 2016 at 10:55 PM
To: "public-data-shapes-wg@w3.org<mailto:public-data-shapes-wg@w3.org>" <public-data-shapes-wg@w3.org<mailto:public-data-shapes-wg@w3.org>>
Subject: Re: inheritance & reuse
Resent-From: <public-data-shapes-wg@w3.org<mailto:public-data-shapes-wg@w3.org>>
Resent-Date: Friday, January 15, 2016 at 10:56 PM

Thanks for your feedback on SHACL. As usual with emails from people outside of the WG, let me clarify that I am not speaking on behalf of the group. Some quick feedback below...

On 16/01/2016 7:08 AM, Johnston, Patrick - Hoboken wrote:
Howdy.

I have started looking at the current draft spec from the point of view of a shape designer (‘shapist’, ’shacler’?).

I am using the spec to think through a current project, which is essentially an enterprise level information model abstraction using OWL. Adding shapes into the mix brings in many of the issues described in https://www.w3.org/2014/data-shapes/track/issues/44, among others. This is allowing me to understand what you have done, and test some of the ideas in an applied setting. I am using Holger’s test harness (https://github.com/TopQuadrant/shacl) as a way to explore this.

Anyone wanting to implement shapes will find managing them a major challenge.  Much more so than ontologies, the number of different sets of constraints a reasonably sized enterprise might need to implement will lead to a large number of shapes applicable in different scenarios, and at different points in enterprise workflows. Start exposing these alongside open data, and it will get worse. As a result, there will be lots of shape graphs with some degree of variation between them, but also with much in common. Even within a given shapes graph, there will be many shapes that share the same characteristics. In the same way that we build reusable code libraries, we will want to build reusable shapes and shape parts. One could even foresee some of these served up through open data shape repositories.

For the moment, I am looking at what I can do within a given shapes graph. I am limited in terms of what I can test with owl:imports, as this would really require the test harness to support something like TriG (or JSON-LD) so that I could define multiple graphs and explore the ramifications of merging triple sets between them (I think jena supports this, though WGTest seems hard coded to turtle). However, I can explore most of the implications from within a single shapes graph.

I have written a series of test cases to validate my interpretations of what you have written. I am hopeful that this may actually serve to reduce some of the ambiguities in the spec, in particular how SHACL relates to RDFS and OWL from a practical standpoint, which is far from clear (to me at least). If this useful to you, I am happy to create a pull request to add the tests to the test bank – it doesn’t look like I am overlapping with the current set. I am also happy to work on other similar scenarios. If this is not useful to you, it’s still a good learning experience for me. Happy to discuss as well.

  1.  Reusing the same constraint definitions (reuse-001, OK)
  2.  Having a shape use another shape definition:
     *   By explicitly referencing it* (reuse-002, the test that fails is a possible enhancement to the spec)

FWIW you have a ex:scopeClass there - should be sh:scopeClass I guess. We had discussed how to best represent "inheritance" between shapes, and concluded that sh:and is sufficient. Other presentation languages such as a compact syntax may provide sugar to display these differently.

PJ> Yes on the typo, thanks. I get that sh:and (or sh:or) is sufficient, but it is a somewhat awkward way of expressing a simple, common, idea: reuse. sh:property, for example, gets pride of place when that is just syntactic sugar for sh:constraint [sh: predicate …]. I get that someone might eventually implement JSON-LD style extensions for this sort of thing (cf. @list), but why make the syntax more obscure than it needs to be? Since my intention wasn’t to critique the specification, more understand it, I’ll leave this one alone.


  1.
     *   Through shape class inheritance (inheritance-001, this test fails – not sure if it’s a bug in the test or this is what is supposed to happen)

I think this is the expected behavior. Using rdfs:subClassOf between shapes does not have any effect unless you use them as classes. But your example has instances of ex:Class1. A solution based on subClassOf would require you to make Shape1 superclass of Class1 and attach the constraints of Shape2 to Class1.

PJ> OK, not the answer I was hoping for, but so be it. In terms of the test, its purpose was to allow for a class-agnostic shape, Shape1, to be reused by more than one class-scoped shape (here, Shape2). If I were to make Shape1 a superclass of Class1, I would lose that. I admit I find the way section 1.1 is worded really confusing, and the more I read it the less clear it becomes, in particular around the scope of rdfs:subClassOf. I like Peter’s definition approach in one of the earlier threads. Maybe it would be better to come right out and say that RDFS actually plays no part in the construction of shapes and the shapes graph, but that shapes are able to follow rdfs:subClassOf relationships declared against instances in the data graph. The question is then whether these declarations will be processed if they are made in only the data graph, or the shapes graph, or both? What is really unclear is what of OWL can play in here, if at all, for example instances of owl:Class, definitions which might themselves be imported into the shapes or data graphs using owl:imports.  I don’t want to reignite what seems to have been a painful debate, but not mentioning OWL is doing your readers a disservice. I see a place for both OWL and SHACL in the work I am doing currently (hence the questions), they achieve different ends.


  1.
  2.  Having a shape apply to data graph subclasses of a class in its scope (inheritance-003, OK).
  3.  The ramifications of shape merging through reuse, inheritance, and owl:import.
     *   The same shape with overlapping constraints (inheritance-002, fails for the same reason inheritance-001 fails)
     *   Different shapes with the same scope and overlapping constraints (inheritance-004, OK)
     *   Duplicated triples in data graphs (e.g. If there are instances of shape classes in the owl:import) (duplication-001, OK)

RDF graphs are sets of triples, so it is not possible to have duplicate triples in a single Turtle file. The Jena parser would already remove the duplicates.

PJ> Agreed that this shouldn't happen, but Jena is just one implementation: other implementations may just leave duplicates in (maybe saying the same thing twice means something to somebody), especially if they originate from different graphs in a quad store, say. Using owl:imports isn’t exactly common behavior in regular linked-data-land, so I think it is worth calling out explicitly.


  1.
  2.  The effect of uniqueLang when language is not specified (uniqueLang-001, fails)

I cannot comment on how common the case you describe will be in practice - having a fallback language and interpreting plain xsd:string literals as having a default language. Maybe you are right, but it will certainly complicate the logic here (e.g. we would need to decide what to do with rdf:HTML triples not just xsd:string). In any case there is the fallback to define your variation of the unique language pattern in SPARQL.

PJ> If the intended scope of sh:uniqueLang are values of type rdf:langString, then the RDF1.1 spec seems to indicate that this also encompasses plain strings (see https://www.w3.org/TR/rdf11-concepts/#dfn-language-tagged-string). In other words, “fred” will validate both as an rdf:langString and as xsd:string. Even if this were not the case, I would still expect this refinement to fail, and it doesn’t:

ex:Shape2
    a sh:Shape ;
    sh:scopeClass ex:Class2 ;
    sh:constraint [
       sh:or (
        [
            sh:property [
                a sh:PropertyConstraint ;
                sh:uniqueLang true ;
                sh:predicate ex:property1 ;
                sh:datatype rdf:langString
            ]
        ]
        [
            sh:property [
                a sh:PropertyConstraint ;
                sh:maxOccurs 1 ;
                sh:predicate ex:property1 ;
                sh:datatype xsd:string
            ]
        ]
       )
    ] .

ex:Class2 a rdfs:Class .

ex:InvalidShape12
   a ex:Class2 ;
   ex:property1 "fred" ;
   ex:property1 "barney" .


*I think this is something missing from the spec. I should be able to use sh:shape from one shape to another: ex:Shape1 sh:shape ex:Shape2. Compare ex:Shape2 with ex:Shape3 in the enclosed reuse-002.ttl. Of course, similar effects could be achieved with rdfs:subClassOf on shapes, though this doesn’t appear to work currently.

(A minor request on the test harness: it would be useful if it spat out a positive message with the filename on a successful test result. It’s sometimes hard to work out what you’re testing. It’s not a big deal, though.)

Feel free to send me a snippet (and where in the code to insert it)

Thanks
Holger
Received on Sunday, 17 January 2016 06:42:34 UTC