Using OWL for data modeling, was Re: Domain and range are useful Re: DBpedia 3.2 release, including DBpedia Ontology and RDF links to Freebase from Sandro Hawke on 2008-11-19 (semantic-web@w3.org from November 2008)

From: Sandro Hawke <sandro@w3.org>
Date: Wed, 19 Nov 2008 12:37:46 -0500
To: Dan Brickley <danbri@danbri.org>
cc: Pierre-Antoine Champin <swlists-040405@champin.net>, Semantic Web <semantic-web@w3.org>
Message-ID: <9616.1227116266@ubuhebe>
> 
> Pierre-Antoine Champin wrote:
> > Dan Brickley a =E9crit :
> >> I do recommend against using RDFS/OWL to express application/dataset
> >> constraints, while recognising that there's a real need for recording
> >> them in machine-friendly form. In the Dublin Core world, this topic is
> >> often discussed in terms of "application profiles", meaning that we wa=
> nt
> >> to say things about likely and expected data patterns, rather than doi=
> ng
> >> what RDFS/OWL does and merely offering machine dictionary definitions =
> of
> >> terms.
> >=20
> > Why would you recommend against it?
> >=20
> > Would not a good practice be to simply separate in two RDF graphs
> > - "intensional" axioms, those representing the meaning of the terms
> >   and that should be assumed by people reusing the vocabulary
> > - "extensional" axioms, those representing properties/constraints of
> >   the dataset, that should be used to check its
> >   consistency/completeness.
> >=20
> > Depending on their need, people would only import the first graph, or
> > both of them...
> 
> I guess primarily because it is clearer for everyone if 'domain' and=20
> 'range' have their conventional meaning, rather than sometimes meaning=20
> what the W3C groups intended, and sometimes meaning something quite=20
> different. Since RDF is designed to mix and to flow, keeping the=20
> dataset-oriented usages separate is likely to be quite hard.
> 
> Also I expect dataset-checking applications to have different=20
> requirements (eg. around optionals, co-occurance constraints, datatype=20
> values) that simply don't map tidily into RDFS/OWL constructs. Building=20
> on SPARQL there has some promise I think - eg see=20
> http://isegserv.itd.rl.ac.uk/schemarama/
> http://swordfish.rdfweb.org/discovery/2001/01/schemarama/
> 
> On the dataset-characterisation front, there are also efforts like=20
> http://semwiq.faw.uni-linz.ac.at/node/9 that are worth exploring, also=20
> http://esw.w3.org/topic/SparqlEndpointDescription2 ... which are=20
> connected with scenarios around distributed SPARQL query. Again, I don't=20
> see RDFS/OWL's property-description constructs as being particularly=20
> attuned to this problem.

I think it's possible to use OWL to do both these things:
   
    1.   To describe real world stuff.  A human is born with exactly one
         biological mother (another human) and exactly one biological
         father (another human).   This isn't a perfect description;
         it's an ontology, a particular written-down conceptualization
         about some real world stuff.   We could have different
         ontologies about human births because we think about them
         differently and make different generalizations about them

    2.   To describe the data model required at some computer interface.
         Each data-record about a human birth includes zero or one
         identifiers for the biological mother (another data-record
         about a human) and zero or one identifiers for the biological
         father, etc.  This probably can be a perfect description; it's
         using the ontology language to describe something that's
         already abstract.

The essential difference is in selecting the domain of discourse.  What
are the things you're talking about?  Are they flesh-and-blood, or
computer abstractions?  This is surprisingly hard to do, because those
computer abstractions are intended to represent the flesh-and-blood
entities.  System designers have learned to (sometimes) use one in place
of the other, in their reasoning.

I use this example -- with the cardinality of "mother" -- because it's a
pretty crisp test about which world you're in.  In the real world (give
or take origin-of-life issues) every human has exactly one biological
mother.  In a data model definition, if you say every person record must
have a valid pointer to another person record, representing the mother,
you're going to have real problem.   You'll never be able to construct a
valid data set, database, document, whatever.

If you're going to use OWL for both #1 and #2, I think it's essential to
keep this distinction clear.  Formally, you should have different
classes, something like my:Human and my:HumanRecord.  Or, more
practically, x:Human and y:Human, where ontology x is clearly about
people and ontology y is a system interface definition, a data model.

Once we're clear about this distinction, we can get a better sense of
whether or not OWL is a good language for #2, or if people should stick
to XML Schema, UML, database schema systems, etc. 

(The reason we're drawn to using OWL for expressiong data models, of
course, is that we're exchanging data in RDF, so the fit is pretty darn
good, and none of the other tools work well.  The problem may just be
this confusion about whether we're modeling the world or the data
structures, or it may that OWL really isn't suitable for this.  Once
we're past the confusion, maybe we can find out.)

    -- Sandro

       (disclaimer: I am W3C staff contact for the OWL Working Group,
       but in no way speaking on behalf of that group -- I don't recall
       the group ever talking about this issue, and I'm certainly not
       representing them in this thread.)
Received on Wednesday, 19 November 2008 17:37:57 UTC