Re: New Terminology Section from Holger Knublauch on 2016-05-10 (public-data-shapes-wg@w3.org from May 2016)

From: Holger Knublauch <holger@topquadrant.com>
Date: Tue, 10 May 2016 14:52:00 +1000
To: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>
Message-ID: <82c75657-cec4-8d22-203b-eecd5ec93ae7@topquadrant.com>
On 10/05/2016 14:31, Tom Johnson wrote:
>
>
> On Mon, May 9, 2016 at 8:18 PM, Holger Knublauch 
> <holger@topquadrant.com <mailto:holger@topquadrant.com>> wrote:
>
>     On 10/05/2016 12:30, Tom Johnson wrote:
>>
>>
>>     On Mon, May 9, 2016 at 5:29 PM, Holger Knublauch
>>     <holger@topquadrant.com <mailto:holger@topquadrant.com>> wrote:
>>
>>
>>
>>         On 10/05/2016 10:11, Tom Johnson wrote:
>>>         Irene, you say:
>>>
>>>         >"Doing more" doesn't create a problem, but, on the other
>>>         hand, it is not required.
>>>
>>>         I'm really uncertain about this. Couldn't inferring further
>>>         class relations (e.g., by using the entailment mechanism
>>>         included in the spec) cause different results for basically
>>>         every operation in SHACL?
>>
>>         Can you think of a specific example? sh:entailment would
>>         potentially produce additional triples. But this is the
>>         user's choice, and then the user may expect to see additional
>>         validation results...
>>
>>
>>     We seem to be in agreement that inferring additional triples will
>>     change results. Examples seem obvious; adding a `subClassOf`
>>     statement whose subject is any class referenced in a shape will
>>     do the trick, but that's far from the only example.
>>
>>     This seems like a problem to me because I don't see that it's
>>     clear where triples like `subClassOf` must appear (data graph?
>>     shapes graph? any graph?) for a resource to count as a shape, or
>>     to match various constraint components.
>
>     To have an effect on sh:scopeClass and sh:class, the subClassOf
>     triples must be in the data graph.
>
> Is this stated somewhere in the current spec? I haven't been able to 
> find it, if so.

For sh:scopeClass, Section 2.1.2:

"Note that, according to the SHACLinstance definition, all 
the|rdfs:subClassOf|declarations must exist in the data graph."

For sh:class the same rules apply as for every other constraint 
component - it looks for triples in the data graph. We could 
theoretically repeat this everywhere, e.g. for sh:minCount, but at some 
stage this should be clear. However, given that multiple people have run 
into this question recently, I have just added a clarification to sh:class:

https://github.com/w3c/data-shapes/commit/4c0b8f1cbc8faa09624d1a35fc0a8ef564af09b7 


>
> Also, the question applies equally to cases where the intent is 
> presumably that (only?) the data graph counts. For instance: which 
> resources count as sh:Shapes?

This would have to be in Section 4, but this is currently under revision 
and may be merged with section 2 shortly, so I'll not touch it right 
now. But the intent is that any Shape definition triples such as 
ex:MyShape rdf:type sh:Shape are only relevant if they are in the shapes 
graph.

>>     Note that adding a `subClassOf` triple to a shapes graph to
>>     effect validation could be considered a feature; I'm unsure
>>     whether that feature is supported.
>
>     Currently the spec only looks at the data graph.
>
>>
>>     Additionally, `sh:entailment` seems generally under/un-defined.
>>     Can inference effect data graphs only? or also shapes graphs?
>>     Which triples can be considered by a reasoner and how are
>>     inferred triples used by the SHACL semantics?
>
>     I have just clarified this to the sh:entailment section:
>
>     https://github.com/w3c/data-shapes/commit/71a9eeaff0317de0cdca6b36500412dabc922f78
>
>     I am unsure how many people will actually use sh:entailment, so
>     any feedback/requirement may help us add missing details. It is
>     very brief right now, indeed.
>
>
> I think some clear definition is called for; otherwise, I would simply 
> remove the feature; is there a functional difference between 
> entailment (in this case) and providing a mechanism for the 
> user/engine to add arbitrary triples to the data or shapes graph 
> during pre-processing? This could be a simpler way to think of the 
> problem.

Regardless of whether sh:entailment exists, any implementer or engine 
already has any freedom to modify the graphs prior to sending them to 
the SHACL engine. This is outside of the SHACL language. The rest needs 
to be decided by the WG, for which I cannot speak here.

Holger


>
> - Tom
>
>
>     Holger
>
>
>>
>>     Some of my other concerns about the specifics of `class` and
>>     `instance` definitions seem to be in the process of being fixed
>>     up; from a quick reading of the latest editor's draft, this is
>>     looking promising.
>>
>>     - Tom
>>
>>         Thanks, i
>>         Holger
>>
>>
>>
>>>
>>>         In lieu of a repeat of previous conversations, I'll just
>>>         say: For me, as an implementer in waiting, this is a huge
>>>         problem. On last reading, very little seemed unambiguously
>>>         defined.
>>>
>>>         - Tom
>>>
>>>         On Mon, May 9, 2016 at 12:14 PM, Irene Polikoff
>>>         <irene@topquadrant.com <mailto:irene@topquadrant.com>> wrote:
>>>
>>>             Karen,
>>>
>>>             As I understand it, RDFS inferencing is one way to
>>>             address this. However,
>>>             RDFS inferencing would do more than what is specified
>>>             here. "Doing more²
>>>             doesn¹t create a problem, but, on the other hand, it is
>>>             not required.
>>>
>>>             Another way to address this is to run a query as follows:
>>>
>>>             SELECT ?resource
>>>             WHERE {
>>>
>>>             ?class rdfs:subClassOf* example:Class1 .
>>>             ?resource a ?class .
>>>
>>>             }
>>>
>>>             Running this query would not change any graphs. As an
>>>             aside, RDFS
>>>             inferencing is also often done without modifying any
>>>             graphs. Inferences
>>>             are calculated on the fly when users/systems query data
>>>             without any
>>>             materialization of inferred triples. At least, this is
>>>             how triple stores
>>>             that support RDFS inferencing typically work.
>>>
>>>             Does your concern have to do with where the
>>>             rdfs:subClassOf triples come
>>>             from - would they exist in the data graph, would they
>>>             exist in the shapes
>>>             graph? They could be in either. If no subclass triples
>>>             are there, then the
>>>             first triple match simply binds ?class to example:Class1
>>>             and the query
>>>             result is the same as if we were only looking for nodes
>>>             that are connected
>>>             to example:Class1 via rdf:type link.
>>>
>>>             It doesn¹t seem to be a role of SHACL to mandate where
>>>             these triples
>>>             should be located. If they are available in either of
>>>             the graphs, a SHACL
>>>             engine should take them into account. If they are not
>>>             available, than it
>>>             doesn¹t take them into account.
>>>
>>>             In our experience, users typically put the subclass
>>>             triples into the
>>>             shapes graph. At the same time, they need flexibility to
>>>             do whatever fits
>>>             their architecture and processes.
>>>
>>>
>>>             Irene Polikoff
>>>
>>>
>>>             On 5/9/16, 1:47 PM, "Karen Coyle" <kcoyle@kcoyle.net
>>>             <mailto:kcoyle@kcoyle.net>> wrote:
>>>
>>>             >Type
>>>             >The types of a node are its values of rdf:type as well
>>>             as the
>>>             >superclasses of these values.
>>>             >
>>>             >This conflates two different relationships: the
>>>             relationship of a
>>>             >subject to a class (as defined in RDF/RDFS), defining
>>>             the subject as an
>>>             >instance of the class; and the sub-/super-class
>>>             relationships between
>>>             >classes. I dont' see how this can be achieved without
>>>             inferencing.
>>>             >
>>>             >If we assume some pre-processing of the data graph to
>>>             include the
>>>             >superclasses, then type is precisely as it is defined
>>>             in RDF - there are
>>>             >just more type statements in the graph.
>>>             >
>>>             >As stated, this is quite an expansion of the meaning of
>>>             type. In
>>>             >addition, it appears to require modifications to the
>>>             data graph to
>>>             >include the super classes of each class (presumably up
>>>             to and including
>>>             >rdfs:Resource).
>>>             >
>>>             >I think it would be best if SHACL defined the shape and
>>>             data graphs as
>>>             >immutable, thus expecting that all operations read but
>>>             do not modify the
>>>             >graphs. I thought we had come to that conclusion.
>>>             >
>>>             >kc
>>>
>>>
>>>
>>>
>>>
>>>
>>>         -- 
>>>         -Tom Johnson
>>
>>
>>
>>
>>     -- 
>>     -Tom Johnson
>
>
>
>
> -- 
> -Tom Johnson
Received on Tuesday, 10 May 2016 06:42:45 UTC