Re: Shapes are Classes, even if you don't use rdf:type from Holger Knublauch on 2015-01-25 (public-data-shapes-wg@w3.org from January 2015)

From: Holger Knublauch <holger@topquadrant.com>
Date: Mon, 26 Jan 2015 09:38:33 +1000
To: public-data-shapes-wg@w3.org
Message-ID: <54C57E79.60407@topquadrant.com>
Hi Jerven,

many thanks for speaking out. You have some really useful ideas.

On 1/25/2015 22:25, Jerven Bolleman wrote:
> Dear Working Group,
>
> I have tried to keep to the sidelines in this discussion,
> but as a very interested user of this kind of tech I feel
> I need to speak out.
>
> Shapes are Classes, in all practical and theoretical terms [1].

The terms are indeed extremely similar, to the point where the 
difference becomes almost philosophical. A class is a group of instances 
with similar characteristics. A shape is a group of nodes with similar 
characteristics. The difference is that class membership can also be 
asserted (via rdf:type). Yet OWL and LDOM also use class definitions as 
templates that can be used to "classify" instances. In OWL this is for 
example done by owl:equivalentClass definitions, in LDOM the same can be 
achieved via the built-in ldom:violatesConstraints function, which is 
basically a way to check whether a given node *could* be a valid 
instance of a class. You can rewrite that last sentence to use "Shape" 
without much difference.

But the difference is that classes are already established. rdf:type is 
already established. rdfs:subClassOf is already established. The 
question is whether we need equivalent terms such as oslc:instanceShape 
and an equivalent of sub-shape relationship when we could just hook into 
the already existing infrastructure, hook into the already existing 
ontologies and be compatible to mainstream OO principles instead of 
coming up with some parallel universe.

> ShEX shapes are just another way to infer class membership
> (Closed World but otherwise basically OWL all over again)
>
> Instead of inferring example:A is a member of an owl:Class you now
> infer that example:A is a member of things that have shape Y.
> Using the word shape instead of Class is good to avoid confusing
> between OWL and this standard, but they are the same thing just
> relabelled.
>
>
> The fact that shapes tries to avoid rdf:type at all cost is
> going to be a real problem in even trivial real world cases.
> e.g.
>
> example:office example:telNo “+41 41 41 41” .
>
> example:person example:name “example person” ;
>                example:telNo “+32 32 32 32” .
>
> <officeShape> {
> 	example:telNo xsd:string
> }
>
> <personShape> {
> 	example:telNo xsd:string
> 	example:name xsd:string
> }
>
> Is example:office a member of the <personShape> just without a phone number?
> Yes or No. If it is not clear in this trivial example, how can we end users,
> reason about it and build stable software?
>
> LDOM, SPIN and OCLS all solve this by depending on the rdf:type.
> Its simple and clear cut.
>
> Now sometimes a direct rdf:type use is not enough or can be confusing.
> Because, in all proposals what is lacking is associating a shape/constraint
> with the context in which this constraint should apply.
> Introducing a new predicate _ldom:context_ which links a resource describing
> when the constraint could be used.
>
> e.g.
> ex:Rectangle
> 	a rdfs:Class ;
> 	rdfs:subClassOf rdfs:Resource ;
> 	rdfs:label "Rectangle" ;
> 	ldom:property [
> 		a ldom:PropertyConstraint ;     # This type declaration is optional
> 		ldom:predicate ex:height ;
> 		ldom:minCount 1 ;
> 		ldom:maxCount 1 ;
> 		ldom:valueType xsd:integer ;
> 		rdfs:label "height" ;
> 		rdfs:comment "The height of the Rectangle.” ;
> 		ldom:context ex:Normal_Geometry ;  # Here we say where we intent the context to apply
> 	] .
> ex:Normal_Geometry rfds:label “Euclidean geometry in 2 dimensions” .
>
> If we give each ldom:property an explicit way to state in which context they apply
> we can actually deal with different people using foaf:person in multiple manners.
> e.g. the constraints on foaf:person data being submitted to a restaurant reservation
> site is different to the constraints on foaf:person data being submitted to a car rental
> site.
>
> The LDOM processor can then choose to state which contexts applies to its users needs.
> The default would sensibly be all, and allow users to white or black lists to include or exclude
> contexts as they want.
>
> This is a much cleaner solution than the shapes one. In shapes we attempt to separate the ontologies and
> their constraints to avoid constraint collisions, but we just hope that we don’t import them anyway.
> With this context suggestion, constraint collisions become something we can deal with.
>
> The advantage of attaching a context to constraints is that you can then say something like a
> post request with RDF data to book the rental of a car requires 1 driver, 1 driver license and 1 payment method.
> Currently in shapes and ldom, an empty message validates as well :( Plus it allows users
> to communicate when constraints should hold and when not. e.g. describing the steps in a wizard,
> step 1 has less constraints  on the submitted data then after step 2.

This sounds like a brilliant suggestion to me. It is a strong 
alternative to what we proposed using named graphs in the past, only 
that it makes the graph name explicit, which means that when named 
graphs get merged into a single flat graph, they can still be distinguished.

I believe this approach can solve many use cases in which there were 
application-specific or portal-specific extra constraints. In order to 
make it a proposal to the group, I have therefore turned it into a 
corresponding requirement

https://www.w3.org/2014/data-shapes/wiki/Requirements#Grouping_Constraints_into_Contexts

> Secondly, I do think that ldom should be able to work from predicates as well.
>
> ex:widthIn_cm a rdf:Property ;
>          rdfs:label “width in centimetre” ;
>          ldom:property [ ldom:valueType xsd:positiveInteger
>                          ldom:context ex:realSpace ] .
>
> Allowing this kind of construct should help the dc:terms case where rdf:types
> are not specified.
>
> While modelling from a predicate is not everyone’s cup of tea I find that it meshes
> nicely with the Smalltalk message based OO paradigm, in comparison to the conventional
> ADT type OO paradigm of Java&C++. Which is why I believe it should have a place in this
> standard.

Again, a very interesting idea that looks implementable. I am undecided 
whether I would propose this as a requirement yet, because there might 
be a more general solution which is to use a combination of 
ldom:GlobalConstraint and templates. Your example above could be turned 
into a reusable LDOM Template with an argument ldom:valueType and an 
argument ldom:predicate. We (or anyone really) could create a template 
superclass for all the property-related global constraint templates. 
These would even have an explicit reference to the predicate that tools 
could use for display purposes etc. So the following would already work 
without requiring changes to the execution engine:

ex:WidthInCmConstraints
     a ldom:GlobalPropertyConstraint ;
     ldom:predicate ex:widthIn_cm ;
     ldom:valueType xsd:positiveInteger ;
.

and the ldom:sparql behind this would look for all occurrences of 
ex:widthIn_cm in the middle of a triple, regardless of the type of the 
subject.

While I would indeed prefer to align with UML-style OO as much as 
possible (and have these properties attached to rdfs:Resource), you have 
a good point that sometimes this may be inconvenient.

I welcome further discussion of this idea.

>
> Sometimes data does not have types associated with them. In this relatively rare
> case I humbly suggest that the user use an existing W3C standard to infer a type:
> namely OWL. And if OWL doesn’t float their boat then use a SPARQL update statement.
> Totally typeless data is rare and should not be the primary use case for this WG.

+1 there is the danger of designing a language that is optimized for a 
small set of corner cases, while the majority of cases is unnecessarily 
complicated (e.g. by having parallel definitions such as ex:Person and 
ex:PersonShape).

Holger


>
> e.g.
>
> <officeShape> {
> 	example:telNo xsd:string
> }
>
> is practically equivalent to
>
> : officeShape a owl:Class ;
>          rdfs:subClasOf [ a owl:Restriction ;
>        			  owl:onProperty example:telNo ;
>                            owl:minCardinality 1 .
>                         ].
>
> In both cases some kind of reasoning has to take place to determine if
> the following triple
>
>   example:office example:telNo “+41 41 41 41” .
>
> means that triples about example:office meet the criteria of <officeShape>.
>
> Now get back to work and standardise something fantastic !
>
> Sincere regards,
> Jerven Bolleman
>
> [1] If it quacks like a duck and does not carry a shotgun then for all
> practical purposes it is a duck. All though for our favourite instance
> example Dick Cheney its “If it quacks like a duck then its a target” ;)
> even if what quacks wears a bright fluorescent jacket and practices law.
>
> -------------------------------------------------------------------
> Jerven Bolleman                        Jerven.Bolleman@isb-sib.ch
> SIB Swiss Institute of Bioinformatics      Tel: +41 (0)22 379 58 85
> CMU, rue Michel Servet 1               Fax: +41 (0)22 379 58 58
> 1211 Geneve 4,
> Switzerland     www.isb-sib.ch - www.uniprot.org
> Follow us at https://twitter.com/#!/uniprot
> -------------------------------------------------------------------
>
>
Received on Sunday, 25 January 2015 23:42:05 UTC