OWL, XML-RDF and Imports

My recent experience with trying to handle imports in an implementation
has lead to the following insights/thoughts, in particular a feeling that
imports as currently defined is not a sufficient/appropriate mechanism to
support modularisation.

Take the example of species validation, which I am currently working on.
In order to make clear the points further on, I'll set out some of my
assumptions. Apologies if this is labouring the point, but it's important
that this is clear in case it's my misunderstanding of the facts. And if
it's the latter, then the specs obviously need work :-).

I am making the following assumptions about species validation:

An RDF/XML document is an OWL-Full document if it obeys some rules, for
example, to quote a quote in a recent mail from Dan C:

"An OWL Full document is an RDF/XML document [RDF/XML Syntax], for which
the corresponding RDF graph [RDF Concepts] does not use any URI references
starting with the prefix http://www.w3.org/2002/07/owl# except those found
in the [RDF Schema for OWL]."
 -- http://www.w3.org/2002/03owlt/editors-draft/snapshot#docConformance

Roughly put, an OWL-Full document an OWL-Lite/DL document if [1]:

o the collections of URI references of classes, object properties,
  datatype properties and individuals are all disjoint.
o all individuals are typed.
o all properties must be explicitly typed as object or data properties.
o everything's well-formed, e.g. restrictions must have a single
  owl:onProperty and one of (someValuesFrom|allValuesFrom|cardinality).
o none of the OWL, RDF or RDFS vocabulary has been abused.

There are then further restrictions on DL/Lite concerning the levels of
expressivity used which I won't elaborate here.

An imports directive in an ontology simply pulls all the RDF triples from
the target (and any imported ontologies) into the current model. Any
processing/validation is then done in the context of this complete bag of
triples.

The task is now to take an ontology and determine which species it lives
in.

So let's take a couple of little examples.

Ex 1)
-----

Ontology A contains the triples:
-------------------------------
A owl:imports B
x rdf:type owl:Thing
p rdf:type owl:ObjectProperty
x p y
-------------------------------

Ontology B
-------------------------------
B owl:imports A
y rdf:type owl:Thing
p rdf:type owl:ObjectProperty
x p y
-------------------------------

Now in this case (according to my reading of the rules), A and B are both
OWL-Lite ontologies as everything's nicely typed. But when taken
independently (ignoring the imports), they're Full. Referring back to my
earlier message, this can further complicate the situation when we're
trying to parse as we can't really parse/process the ontologies
independently. So this isn't a complete disaster, but means that it's less
obvious that it might be how we process these things.


Ex 2)
-----

Similar to 1), but with the added complication that new we're spreading
the well-formedness of a single expression through two ontologies

Ontology A
-------------------------------
A owl:imports B
p rdf:type owl:ObjectProperty
C rdf:type owl:Class
D rdf:type owl:Class
C rdfs:subClassOf R
R rdf:type owl:Restriction
R owl:onProperty p
-------------------------------

Ontology B
-------------------------------
R owl:allValuesFrom D
-------------------------------

In this case, A (disregarding the import) has a malformed restriction, and
is thus certainly not in Lite or DL (and may not even be in Full, but I'm
not entirely sure about that as yet). With the import, it's Lite. However,
B is always "broken".

So we've got a situation where a Lite ontology can import an arbitrary RDF
document and still be in Lite.

I'd argue that this is now getting confusing, and the above (trivial)
examples suggest that imports as it stands isn't really sufficient for
supporting modularisation.

For example, if I'm building an editor that wants to be able to support
modularisation and maintain information in a sensible fashion, situations
like 2) above are really going to confuse it. Somewhere I want to be able
to record the fact that C is a subclass of (some p D), but where is this
information actually held? In A? In B? If I'm ever going to have any
chance of round tripping this stuff I'm going to have to keep a whole load
of information hanging around. The split across the two ontologies is too
fine-grained -- at the level of triples rather than actual conceptual
"chunks" of the ontology. At this point, I won't hide the fact that this
aspect is one of the things that I *do* really dislike about the use of
RDF-XML for representing languages like OWL: the conflation of machinery
for representing underlying syntax and the language primitives themselves,
which then allows me to do this kind of thing. With reference to my
earlier analogy in [2], the matchsticks are now split up into two
different boxes and you've got to try and determine whether the
observation deck is in Box A.

A Proposal
==========

A possible solution would be to place further restrictions on the allowed
triples in an RDF-XML representation of an OWL ontology. For example, we
could require that all the conditions regarding the typing of URI refs and
well-formedness are "locally enforced", i.e. all the necessary bits are
present even when disregarding the imports statements.

This seems to be a sensible thing to do -- If I'm going to use a
Class|Property|Individual from some imported ontology, I should really
know what it is before I use it, so requiring that I explicitly state that
I know its type is no great hardship, and may then allow me to work
independently with the modules, even when the imports are not available.
In Ex 2, I can't really do much at all with A if B is not available (say
I'm editing my ontologies off-line somewhere). Allowing me to effectively
split the syntax of individual expressions across multiple locations (as
is done in Ex 2.) seems to me to be far too lax, and I have difficulty
seeing why one might want to allow this.

My guess is that even if something along these lines is not present, tools
will by default enforce some kind of similar conditions. Without it, it's
near impossible to guarantee that you can work with the modules
independently (which will be, in my opinion, a requirement of tools).

Cheers,

	   Sean

[1] http://www-db.research.bell-labs.com/user/pfps/owl/semantics/semantics-all.html#4.1
[2] http://lists.w3.org/Archives/Public/www-webont-wg/2003Feb/0206.html

-- 
Sean Bechhofer
seanb@cs.man.ac.uk
http://www.cs.man.ac.uk/~seanb

Received on Friday, 14 February 2003 11:59:20 UTC