RE: Shapes vs Classes (in LDOM) from Irene Polikoff on 2015-01-24 (public-data-shapes-wg@w3.org from January 2015)

From: Irene Polikoff <irene@topquadrant.com>
Date: Sat, 24 Jan 2015 01:37:51 -0500
To: "'Jose Emilio Labra Gayo'" <jelabra@gmail.com>
Cc: "'Holger Knublauch'" <holger@topquadrant.com>, "'RDF Data Shapes Working Group'" <public-data-shapes-wg@w3.org>
Message-ID: <12ea01d037a0$47924e10$d6b6ea30$@topquadrant.com>
Jose,

 

< The problem is that we are talking about different things and they are different concepts. […]If a system automatically combines those observations because they have the same type, then it is mixing oranges and apples. >

 

I’d argue that if these are completely different concepts and it would be like mixing oranges and apples, then they should not be members of the same class. Being members of the same class means that they are conceptually similar.

 

I am not sure what system you are talking about, but if triples describing observations from portal A and from portal B ever come together, they will certainly be combined. This will happen because they have the same object in the triples with the “type” predicate. It is plain RDF merging that has nothing to do with constraints -this would happen if there was not a single constraint defined anywhere.

 

< Some times when you are modeling linked data portals that extract data from relational databases or excel sheets, you extract values from tables and link properties to them. You could assign those generated nodes an rdf:type, but it should not be mandatory. And this is or will be a very common use case for linked data applications.>

 

People use various data management practices. Some of these are not what one would call best practices. By definition, a database table specifies a type for the information it contains - be it a table with people or invoices or products, etc. Same with a spreadsheet. Pretty much all proven data management and software development techniques, for a good reason, use types in describing data. Types are important.

 

With RDF, it is possible to not specify the type of a thing you are describing. Sometimes, it is because you may not know what it is and the purpose of the application is to “discover”/infer this info or because you expect it to be added later. In my experience, this happens in a small percent of cases. In most cases, the type is known. Sometimes, it is because people who convert data are going for a “fast and dirty” approach and decide to ignore some of the available info – this seems to be the case you are describing. There could be other reasons as well. Whatever the reason, there is already support for this in LDOM – constraints don’t have to be associated with classes. They could be for a specific node. They could be global/static. So these situations are covered.

 

Irene

 

 

 

From: Jose Emilio Labra Gayo [mailto:jelabra@gmail.com] 
Sent: Saturday, January 24, 2015 1:04 AM
To: Irene Polikoff
Cc: Holger Knublauch; RDF Data Shapes Working Group
Subject: Re: Shapes vs Classes (in LDOM)

 

On Sat, Jan 24, 2015 at 6:31 AM, Irene Polikoff <irene@topquadrant.com> wrote:

Why would this be a problem?  You can have one set of constraints for one portal (constraint definition graph 1) and another set of constraints for another portal (constraint definition graph 2). The fact that they both say that they are constraints for the same class doesn’t seem to matter – one application would use the first one and another application would use the second one. There is no conflict. Further, if these were two applications in the same enterprise, for example, there is also value in capturing the fact that these are two different constraints for the same class is valuable.

 

The problem is that we are talking about different things and they are different concepts. For example, if I have Observations with a shape in PortalA which are of type qb:Observation, I want to talk about the shape of those observations. For example, I want to say that those observations have some properties which are specific to that data portal. While if I want to define the shape of the observations in PortalB, which are also of type qb:Observation but have other properties, I am talking about a completely different concept. If a system automatically combines those observations because they have the same type, then it is mixing oranges and apples. 

 

Also, although it was not the case in my example, there could be other examples where you even don't define the type of the nodes. Some times when you are modeling linked data portals that extract data from relational databases or excel sheets, you extract values from tables and link properties to them. You could assign those generated nodes an rdf:type, but it should not be mandatory. And this is or will be a very common use case for linked data applications.

 

Best regards, Jose Labra

 

From: Jose Emilio Labra Gayo [mailto:jelabra@gmail.com] 
Sent: Friday, January 23, 2015 11:48 PM
To: Holger Knublauch
Cc: RDF Data Shapes Working Group
Subject: Re: Shapes vs Classes (in LDOM)

 

On Fri, Jan 23, 2015 at 11:42 AM, Holger Knublauch <holger@topquadrant.com> wrote:

I think that separation of classes and types (and of course global constraints) is fine - our differences are largely syntactical. I will experiment with adding the class ldom:Shape and a property ldom:shape that links a class with its (additional) ldom:Shapes and publish an update, hopefully early next week. I think this will provide the freedom of separating things (that is advocated by Resource Shapes/ShEx), while at the same time supporting the pattern of attaching constraints to classes (that is working well for SPIN users). Users will be able to mix those types of declarations.

 

I think that is a good step forward and I encourage LDOM to go more in that direction. After taking a look a LDOM, I think one of the main differences between it and ShEx is precisely the impossibility to separate shapes (or sets of constraints) from classes. 

 

In my opinion, it is not practical when one is trying to describe the contents of linked data portals and one is reusing concepts/properties from different vocabularies.

 

As a practical example, I would recommend the following paper [1] where we used the concept qb:Observation in two different linked data portals. The observations had different shapes in both portals with different properties, but all the observations had the same type: qb:Observation. I think that situation happens will happen a lot in real life linked data portals.

 

Yesterday, I proposed to add a user story inspired by that example.

 

Best regards, Jose Labra

 

[1] Validating and Describing Linked Data Portals using RDF Shape Expressions, Jose Emilio Labra Gayo, Eric Prud'hommeaux, Harold Solbrig, 

1st Workshop on Linked Data Quality, Sept. 2014, Leipzig, Germany

PDF: http://labra.github.io/ShExcala/papers/ldq2014.pdf

Slides: http://www.slideshare.net/jelabra/linked-dataquality-2014

 

 



Holger

 

On 1/23/15, 8:05 PM, Dimitris Kontokostas wrote:

I am in no way saying that your proposal is wrong, I am just suggesting my idea for separating distinct validation types (class, global, shape). 

(only one comment inline)

 

On Fri, Jan 23, 2015 at 11:35 AM, Holger Knublauch <holger@topquadrant.com> wrote:

 

On 1/23/15, 7:03 PM, Dimitris Kontokostas wrote:

First of all, great work initiating this Holger!!!

 

Maybe I miss something in the semantics of the class declarations but I would suggest a simplification of the constraint definitions. Examples: 

 

# class example

 

ex:constraintA 

  a ldom:ClassConstraint ;

  ldom:class ex:ClassA, ex:ClassB, ex:ClassC ; #  (oslc:describes)

  ldom:sparql """ ..?this ... """ ;

  ldom:property [
            ldom:predicate ex:propA ;
            ldom:minCount 1 ;
        ] ;

 

in this case, all classes (A,B & C) have a min cardinality 1 restriction on ex:propA which is not possible if we subclass the constraint to a single class.


Hi Dimitris,

to me this looks like the wrong direction. It is much more natural to write

ex:ClassA
    ldom:property [
        ...
    ]

Sharing the same property across multiple classes is also not a scenario that I have come across yet. 

 

I saw that in an OSLC example document and liked the idea.

 

And why the extra burden of creating a URI for the constraint - I guess most people will be perfectly happy with blank nodes. Likewise, why should they have to explicitly declare the type ldom:ClassConstraint, if it is implicit from the context.



We also decouple the schema declaration with the constraint declaration (*)


I don't think this decoupling is often desirable. When someone defines a class, then of course the properties should be defined together with it (just like owl:Restrictions did). What else would a class definition good for?

In case someone really has to define shapes independently from classes, then we can easily add a property such as the inverse of the ldom:class that you have above, e.g. ldom:shape as in

ex:ClassA
    ldom:shape ex:ShapeB ;

This would offer the same flexibility but have it in a more natural direction to cover the most common use cases.

 

# global constraint example, the rdfs:Resource / owl:Thing declaration is redundant 

 

ex:constraintB

  a ldom:GlobalConstraint ;

  ldom:sparql """ ... """ ;

 

# ShExC / RS shapes in a similar way these are currently defined

ex:constraintC

  a ldom:ShapeConstraint ;

  ldom:sparql """ ... """ ; 

  ldom:property [
            ldom:predicate ex:propA ;
            ldom:minCount 1 ;
        ] ;

 

For the ShapeConstraints we can define how validation can performed e.g. starting from a node or inferring the types of the nodes based on the shape definition and then validating in a similar way to the ClassConstraint.

Would something like this solve the class/shape problem?


Why would the solution that I proposed not work?

Thanks,
Holger 






 

 

(*) Another reason for not defining constraints as classes is that automated Agents try to profile datasets for classes / properties used which, might confuse them and give false statistics.

 

Best,

Dimtiris

 

 

On Fri, Jan 23, 2015 at 5:57 AM, Holger Knublauch <holger@topquadrant.com> wrote:

May I suggest we try to resolve the long-standing issue of Shapes versus Classes in the specific context of LDOM. Maybe we can make progress if we have a specific metamodel in front of us.

In the current draft, class definitions are containers of constraints, i.e.

    rdfs:Class
        a rdfs:Class ;
        rdfs:subClassOf rdfs:Resource ;
        ldom:property [
            ldom:predicate ldom:constraint ;
            ldom:valueType ldom:Constraint ;
        ] ;
        ldom:property [
            ldom:predicate ldom:property ;
            ldom:valueType ldom:PropertyConstraint ;
        ] ;

which means that you can define a class such as

    ex:Rectangle
        ldom:property [
            ldom:predicate ex:height ;
            ...
        ] ...

This could (easily) be generalized by moving the properties into a new a class

    ldom:Shape
        a rdfs:Class ;
        rdfs:subClassOf rdfs:Resource ;
        ldom:property [
            ldom:predicate ldom:constraint ;
            ldom:valueType ldom:Constraint ;
        ] ;
        ldom:property [
            ldom:predicate ldom:property ;
            ldom:valueType ldom:PropertyConstraint ;
        ] ;

 which serves as superclass of rdfs:Class

    rdfs:Class
        a rdfs:Class ;
        rdfs:subClassOf ldom:Shape ;

This would mean that users could define stand-alone shapes

    ex:MyShape
        a ldom:Shape ;
        ldom:property [
            ...
        ] ...

And this shape could be reused such as in

    ex:MyClass
        a rdfs:Class ;
        ldom:constraint [
            a ldom:ShapeConstraint ;
            ldom:all ex:MyShape ;
        ] ...

or as an entry point to the validation:

    FILTER ldom:violatesConstraints(?resource, ex:MyShape)

(maybe renaming the function above to ldom:hasShape).

Since rdfs:Class is a subclass of ldom:Shape, class definitions become special kinds of shape definitions. The main differences between classes and shapes would be:

- Classes can be instantiated, i.e. you can have ex:MyRectangle a ex:Rectangle
- Class-based constraints get inherited (Shapes cannot have rdfs:subClassOf)

I don't see practical problems with such a design, and in fact it may be a cleaner separation of concerns. The reason why these two concepts are currently merged into one is that the differences are fairly small, and people could simply define an anonymous (even typeless) class as a collection of constraints, as in Example 9

    http://spinrdf.org/ldomprimer.html#template-constraints

Thoughts?

Cheers,
Holger





 

-- 

Dimitris Kontokostas
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org
Homepage:http://aksw.org/DimitrisKontokostas

 





 

-- 

Dimitris Kontokostas
Department of Computer Science, University of Leipzig
Research Group: http://aksw.org
Homepage:http://aksw.org/DimitrisKontokostas

 





 

-- 

Saludos, Labra





 

-- 

Saludos, Labra
Received on Saturday, 24 January 2015 06:38:37 UTC