Using OWL for RDF constraint checking and closed-world recognition from Peter F. Patel-Schneider on 2014-07-22 (public-rdf-shapes@w3.org from July 2014)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Tue, 22 Jul 2014 03:12:10 -0700
To: "public-rdf-shapes@w3.org" <public-rdf-shapes@w3.org>
Message-ID: <53CE38FA.1000901@gmail.com>
 Using OWL for RDF constraint checking and closed-world recognition

OWL descriptions and the OWL semantics provide the necessary framework for
both validating constraints and providing recognition facilities, and thus
cover what ShEx is trying to do and more.  Why then are there claims that
OWL is inadequate for these purposes?  I do not know why, but there are
several aspects of OWL that might not be consonant with constraints and the
kind of recognition that might be desired.  However, it turns out that both
OWL syntax and semantics are arguably the right solution for RDF constraint
checking and closed-world recognition.


Closed-World Recognition

Let's first look at recognition.  Recognition is the basic operation in
ShEx - we want to determine whether a particular node in an RDF graph matches
a ShEx shape, for example, to say that John in

   <John> foaf:name "John"^^xsd:string .
   <John> foaf:phone "+19085551212"^^xsd:string .
   <John> ex:child <Bill> .
   <John> ex:child <William> .

matches

   { foaf:name xsd:string , foaf:phone xsd:string, ex:child [2] }

Recognition is also a basic operation of OWL.  Determining whether an
individual belongs to an OWL concept is recognition.

The OWL version of the above ShEx expression (using a version of the DL
publication syntax) is

     =1 foaf:name & all foaf:name xsd:string &
     =1 foaf:phone & all foaf:phone xsd:string &
     =2 ex:child

So it seems that OWL can easily handle ShEx recognition.

However, John does not match the above OWL description.  Why is this?  It is
precisely that OWL does not assume that absence of information is
information about absence.  RDF works under the same assumptions, by the
way.  John could have more than one name as far as the above information is
concerned.   OWL (and RDF too) also does not assume that different names
refer to different individuals.  Bill and William could be the same person.

OWL has facilities to explicitly state information about absence and
information about differences.  If we add

     <John> in <=1 foaf:name .
     <John> in <=1 foaf:phone .
     <John> in all child {<Bill>, <William>} .
     <Bill> /= <William> .

then John does match the above description.

So it is not that OWL does not perform the kind of recognition that
underlies ShEx, it is just that OWL does not make the assumption that
absence of information is information about absence.


However, suppose that we want to make this assumption.  This is roughly
equivalent to saying that a system assumes that if it can't determine some
fact, then that fact is false.  There is a very large body of work on this
topic because there are many tricky questions that arise with respect to
closure in any sophisticated formalism, and OWL is indeed sophisticated.

Fortunately RDF and RDFS are not very sophisticated at all, and the tricky
questions just do not arise if information comes in the form of RDF and RDFS
triples.  The basic idea is to treat the triples (and their RDF and RDFS
consequences if desired) as completely describing the world.  So, 1/ if a
triple is not present then it is false and 2/ different IRIs describe
different individuals.  This is precisely the same idea that underlies model
checking.  First-order inference is undecidable, but determining whether a
first-order sentence is true in one particular state of the world is much,
much easier.

So it is possible to use the OWL syntactic and semantic machinery to define
how to recognize OWL descriptions under just the same assumptions that
underlie ShEx.  The only change from the standard OWL setup is to define how
to go from an RDF graph to an OWL model.  (There are some technical details
that interfere with this general description of the account, but they are
easy to handle.)  Definitions, even recursive definitions, can be handled
with only minor extensions to the framework.

This is all quite easy and conforms to a common thread of both theoretical
and practical work.  It also matches how StarDog ICV works (as the
theoretical underpinning of StarDog ICV is one of these theoretical
results).  Further, the approach can be implemented by translation into
SPARQL queries, showing that it is practical.  (There may be some constructs
of OWL that do not translate into SPARQL queries when working with complete
information, but at least the parts of OWL that correspond to the usual
recognition conditions do so translate.)


Constraint Validation

Constraint validation does not appear to be part of the services provided by
OWL.  This has lead to claims that OWL cannot be used for constraint
validation.  However inference, which is the core service provided by OWL,
and constraint validation are indeed very closely related.

Inference is the process of determining what follows from what has been
stated.  Inference ranges from simple (students are people, John is a
student, therefore John is a person) to the very complex.  Inference can
also recognize impossibilities (students are people, John is a student, John
is not a person, therfore there is a contradiction).  In the presence of
complete information, nothing new can be inferred, so inference only checks
for impossibilities, i.e., constraint violations.  So the way do constraints
in OWL is to first set up complete information, and then just perform
inference.

For example, with the concept axiom

   ex:Person <= =1 foaf:name & all foaf:name xsd:string &
                =1 foaf:phone & all foaf:phone xsd:string &
                =2 ex:child

and in the presence of (locally) complete information, such as

   <John> in ex:Person .
   <John> foaf:name "John"^^xsd:string .
   <John> foaf:phone "+19085551212"^^xsd:string .
   <John> ex:child <Bill> .
   <John> ex:child <William> .
   <John> in <=1 foaf:name .
   <John> in <=1 foaf:phone .
   <John> in all child {<Bill>, <William>, <Susan>} .
   <Bill> /= <William> .
   <Bill> /= <Susan> .
   <Susan> /= <John> .

determining whether <John> belongs to ex:Person is just constraint validation.

Setting up complete information is just what was done above.  In a model
there is complete information, so considering an RDF graph (plus
consequences) as a model turns OWL inference into constraint validation.
Of course, this doesn't mean that you have to implement OWL inference in a
model the same way that you need to with incomplete information.  In fact,
as above, constraint validation can be implemented as SPARQL queries.


Conclusion

In fact the main difference between recognition and constraint checking is
that the former either has no axioms or only uses axioms defining names that
do not occur in the RDF graph whereas constraint checking uses axioms that
relate concepts appearing in the RDF graph to descriptions.

So OWL can indeed be used for both the syntax and semantics of constraint
checking and closed-world recognition in RDF, and most or all of it can be
implemented using a translation to SPARQL queries.
Received on Tuesday, 22 July 2014 10:12:40 UTC