Re: Using OWL for data modeling, was Re: Domain and range are useful Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase from Pierre-Antoine Champin on 2008-11-20 (semantic-web@w3.org from November 2008)

From: Pierre-Antoine Champin <swlists-040405@champin.net>
Date: Thu, 20 Nov 2008 12:48:01 +0000
To: Sandro Hawke <sandro@w3.org>, Semantic Web <semantic-web@w3.org>
Message-ID: <49255C81.1000007@champin.net>
The problem with using OWL for #2 (i.e. the data model) is the open
world assumption. Cardinality axioms in OWL are even trickier than
domain and range, for

  :me a x:HumanRecord ;
      x:father :my_father .
  # no explicit mother

would not be inconsistent with a (= 1 x:mother) cardinality "constraint"
on x:HumanRecord.

How would you suggest using OWL to check integrity constraint in the
*explicit* triples only? I know an article from Motik et al. [1] about
that, but it is not standard OWL...

  pa

[1]
http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-258/paper11.pdf

Sandro Hawke a écrit :
>> Pierre-Antoine Champin wrote:
>>> Dan Brickley a =E9crit :
>>>> I do recommend against using RDFS/OWL to express application/dataset
>>>> constraints, while recognising that there's a real need for recording
>>>> them in machine-friendly form. In the Dublin Core world, this topic is
>>>> often discussed in terms of "application profiles", meaning that we wa=
>> nt
>>>> to say things about likely and expected data patterns, rather than doi=
>> ng
>>>> what RDFS/OWL does and merely offering machine dictionary definitions =
>> of
>>>> terms.
>>> =20
>>> Why would you recommend against it?
>>> =20
>>> Would not a good practice be to simply separate in two RDF graphs
>>> - "intensional" axioms, those representing the meaning of the terms
>>>   and that should be assumed by people reusing the vocabulary
>>> - "extensional" axioms, those representing properties/constraints of
>>>   the dataset, that should be used to check its
>>>   consistency/completeness.
>>> =20
>>> Depending on their need, people would only import the first graph, or
>>> both of them...
>> I guess primarily because it is clearer for everyone if 'domain' and=20
>> 'range' have their conventional meaning, rather than sometimes meaning=20
>> what the W3C groups intended, and sometimes meaning something quite=20
>> different. Since RDF is designed to mix and to flow, keeping the=20
>> dataset-oriented usages separate is likely to be quite hard.
>>
>> Also I expect dataset-checking applications to have different=20
>> requirements (eg. around optionals, co-occurance constraints, datatype=20
>> values) that simply don't map tidily into RDFS/OWL constructs. Building=20
>> on SPARQL there has some promise I think - eg see=20
>> http://isegserv.itd.rl.ac.uk/schemarama/
>> http://swordfish.rdfweb.org/discovery/2001/01/schemarama/
>>
>> On the dataset-characterisation front, there are also efforts like=20
>> http://semwiq.faw.uni-linz.ac.at/node/9 that are worth exploring, also=20
>> http://esw.w3.org/topic/SparqlEndpointDescription2 ... which are=20
>> connected with scenarios around distributed SPARQL query. Again, I don't=20
>> see RDFS/OWL's property-description constructs as being particularly=20
>> attuned to this problem.
> 
> I think it's possible to use OWL to do both these things:
>    
>     1.   To describe real world stuff.  A human is born with exactly one
>          biological mother (another human) and exactly one biological
>          father (another human).   This isn't a perfect description;
>          it's an ontology, a particular written-down conceptualization
>          about some real world stuff.   We could have different
>          ontologies about human births because we think about them
>          differently and make different generalizations about them
> 
>     2.   To describe the data model required at some computer interface.
>          Each data-record about a human birth includes zero or one
>          identifiers for the biological mother (another data-record
>          about a human) and zero or one identifiers for the biological
>          father, etc.  This probably can be a perfect description; it's
>          using the ontology language to describe something that's
>          already abstract.
> 
> The essential difference is in selecting the domain of discourse.  What
> are the things you're talking about?  Are they flesh-and-blood, or
> computer abstractions?  This is surprisingly hard to do, because those
> computer abstractions are intended to represent the flesh-and-blood
> entities.  System designers have learned to (sometimes) use one in place
> of the other, in their reasoning.
> 
> I use this example -- with the cardinality of "mother" -- because it's a
> pretty crisp test about which world you're in.  In the real world (give
> or take origin-of-life issues) every human has exactly one biological
> mother.  In a data model definition, if you say every person record must
> have a valid pointer to another person record, representing the mother,
> you're going to have real problem.   You'll never be able to construct a
> valid data set, database, document, whatever.
> 
> If you're going to use OWL for both #1 and #2, I think it's essential to
> keep this distinction clear.  Formally, you should have different
> classes, something like my:Human and my:HumanRecord.  Or, more
> practically, x:Human and y:Human, where ontology x is clearly about
> people and ontology y is a system interface definition, a data model.
> 
> Once we're clear about this distinction, we can get a better sense of
> whether or not OWL is a good language for #2, or if people should stick
> to XML Schema, UML, database schema systems, etc. 
> 
> (The reason we're drawn to using OWL for expressiong data models, of
> course, is that we're exchanging data in RDF, so the fit is pretty darn
> good, and none of the other tools work well.  The problem may just be
> this confusion about whether we're modeling the world or the data
> structures, or it may that OWL really isn't suitable for this.  Once
> we're past the confusion, maybe we can find out.)
> 
>     -- Sandro
> 
>        (disclaimer: I am W3C staff contact for the OWL Working Group,
>        but in no way speaking on behalf of that group -- I don't recall
>        the group ever talking about this issue, and I'm certainly not
>        representing them in this thread.)
Received on Thursday, 20 November 2008 12:48:51 UTC