Natasha, All

Another go at a difficult problem that ought to be easy...

It seems to me that a critical issue is which of two cases you are dealing with:

Case 1: You are using  an existing ontology as a reference to annotate, label, or otherwise carry static information for applications which will query it at 'run time'.  The applications assume that all implications are explicit and add no new information to the ontology.

Case 2: You are re-using the ontology as a module of a larger ontology which you are authoring, possibly to be used eventually as in case 1.

In more detail...

Case 1) There is an existing ontology of animals.  However it was produced, it is has now been fully classified and all subsumptions made explicit.  As long as I do not present it with newly defined classes or individuals, there is no point in applying a DL reasoner to it.  This is how we typically deliver "terminologies".   In this case I want simply to be able to point to a class and say - this resource in my application (ontology or otherwise) refers to that thing in your ontology.  I want third party agents to be able to run up and down the hierarchies to expand or process queries so that if I ask for "books", or more generally "resources", about "animals", I also get "books" about "Lions".   So there are potentially three parties involved:

a)    the producer of the ontology about animals (and possible a2 the producer of the ontology of media)
b)    the annotator of the resources about animals
c)    the builders of the applications/agents that will search for resources annotated with the ontology - using one query language or another.

I don't think this case is significantly different from, for example, entering disease class names from a disease ontology into a medical record.  Furthermore, almost inevitably the answers we get back from queries will be based on closed world semantics - if you look for "books about lions but not tigers" you would expect to get back "All about lions" even though it wasn't explicitly annotated as being about no other kind of animals.

This seems to me to correspond to  Approach 1 in the document.

This case corresponds to the most common use of an ontology for annotation.

Case 2) I am building an ontology about Animals and resources about animals and within that one ontology I want to define classes for  "books about Lions", "Books about Animals", "Film clips about Animals" etc. (Note the distinction with case 1,' "books" about "Lions" ' - in case 1 "Book" and "Lion" will be separate in any query, in case 2 they will be composed).   I may have provided the animal and media ontologies myself or I may have included ontologies built by others, but it is within my ontology and will be reasoned about with the same reasoner as my ontology when classifying the new classes that I have authored.  This was part of the paradigm in 28 April email.

In this case, I need to use Approach 4, either using existential quantifiers (preferred) or explicitly defining the instances (equivalent to 'skolem constants').

This corresponds to re-use of several existing ontologies to build a new one.

NOTE

If we are to use Approach 1, we need an item in the ontology annotations metadata indicating whether an ontology has been pre-classified or expected to be classified before use.

This would also be helpful for approach 2, as in general it is better to import unclassified ontologies as additional axioms may cause additional classifications.

POTENTIAL RESOLUTION

What seems to be needed is a simple "projection" of OWL onto RDF(S) which treats rdf:property links other than rdf:type and rdfs:subtypeOf as existential by default. In this case, Approach 1 would just become the projection of Approach 2.

Unfortunately, I can't see how to do this within the standard layering of OWL onto RDF.  Perhaps the experts can help.

What I can say as a user is that  in most situations I would prefer a "lossy" projection of a provably complete ontology  to either

a) a manually constructed ontology that may omit implied relationships, use notions inconsistently, or contain unsatisfiable concepts

b) an overcomplicated complete representation of that ontology that I cannot query and traverse naturally.

Our experience is that it is extremely difficult, probably impossible,  to construct large, multiaxial ontologies correctly  and completely without machine assistance such as a classifier.

Behind this view is an assumption which fits with Dan McBride's as I understand it, that at 'run time', most users  will want to use a simple query mechanism. At that point they will want to regard the ontology as "pre-coordinated", and need to know nothing of the "Reasoners"/"Classifiers" with which it was built.  However, at author time, authors need the classifier to get the ontology correct.  Indeed, much of what we consider 'best practice', consists of letting the classifier deal with all multiple classification.

The view that the classifier is needed to build the ontology but not to use it fits with our experience in OpenGALEN and also with work on the Gene Ontology Next Generation and other ontology projects in the E-Science programme.  In all but a few cases the classifier is used to build the ontology and find all the hidden implications of the axioms asserted.  The fully classified ontology, with all those implications made explicit and any contradictions fixed, is then delivered to users for applications which treat it as a simple lattice. (The exceptions are applications in which the set of concepts (classes) required cannot be enumerated in advance without undergoing a combinatorial explosion.  These applications require post-coordination by a classifier.  However, at least at the moment, there are only a few such applications.)

Regards

Alan
.
 
 
 

Natasha Noy wrote:

All (and Bernard and Brian in particular),

[Nothing like putting "close to final" on the subject line to catch
people's attention :) ]

I think I still failed to articulate what the goal of the "classes as
values" note is (or at least what I saw as the goal when I was drafting
it).

It is NOT about how to link thesauri and ontologies in general or in
particular. It is NOT about how to link your new ontology to a legacy
vocabulary or your new vocabulary to a legacy ontology.

What I was trying to do, was answer the question that comes up all the
time: I am developing an ontology, and the most natural way to
represent what I want is to reference classes from this hierarchy X in
instances of these other classes. The moment I do this, I am in OWL
Full. If I'd rather stay in OWL DL, what are my options? Indeed, this
question comes up in the context of annotating individuals from your
ontology with classes from some "legacy" hierarchy, but not only there.

If someone who sees the difference here could try and re-write the
preamble so that this point is clear, I would really appreciate that
(seriously, we need to make the difference clear, and I don't seem to
be able to articulate)!

So, to address Brian's concerns (I'm taking his points somewhat out of
order, but it's easier this way):

> Approach 1 is described as being in Owl Full.  I looks to me like RDFS
> until
> the extension to restrict the values of dc:subject is introduced.  That
> restriction doesn't seem to be about the main purpose of the note, i.e.
> about defining subject hierarchies.  I suggest it might be useful to
> separate out the RDFS solution and that Owl FULL solution, with
> separate
> examples of each.

This comment makes sense if you view the note as a description of how
to link thesauri with individuals in an ontology. With the caveat
above, does it still stand? (It seems that whether or not it is RDFS is
beyond the point, given the goal)

> I also note that the example seems very close to the work of the
> thesaurus
> task force.  I'm a bit nervous about this overlap.

Do you still see this as overlap given the point above? And why would
you be nervous about that anyway?

> Has it been reviewed by the DC folks?

To the best of my knowledge, no. Again, the goal here is NOT to discuss
proper uses of dc:subject. I've asked this question (and the trade-off
of using dc:subject or some local made-up property) here:

http://lists.w3.org/Archives/Public/public-swbp-wg/2004May/0026.html

There was never any reply, but I am still wondering if it's worth using
something other than dc:subject (the natural question of "why didn't
you use dc:subject" notwithstanding)

Similarly, to Bernard:

> First I would like to see more precise definition of the use cases in
> each approach : what
> is given (the legacy), what are the objectives of the use case, and
> what is to be built to
> achieve them.
> Basically we have two KOS : on one side an ontology, on the other side
> a
> classification-indexing system (a library system, to make it short).
>
> It's unclear in each approach if the use case is :
>
> #1 : Both ontology and library system are considered as given in the
> legacy, have been
> built and managed independently, and the problem is to map them, from
> one side or another.
> The use case should make it clear which is the master and which is the
> slave.
> #2 : The two systems are developed together in an integrated
> environment. This looks like
> a "closed" use case not really in the scope of the Semantic Web
> deployment. Maybe we
> should skip that one.
> #3 : One of the system is built to be best interoperable with the
> other, which is given
> (can be both ways).

Either one of them. It is true that in some of the approaches we make
some changes on the ontology side and in some on the vocabulary side,
but I really wasn't thinking in those terms. Again, the goal was
somewhat different. (And certainly it's not #1, since most of the
approaches assume that you have control over either one or the other).

Again, you may not have two different systems at all. Somewhere in your
ontology you may have a hierarchy that you need to reference from
somewhere else and staying in OWL DL is your concern (or not).

Subjects are used only as an example.

I'll reply to other points in a separate email.

Natasha

--
Alan L Rector
Professor of Medical Informatics
Department of Computer Science
University of Manchester
Manchester M13 9PL, UK
TEL: +44-161-275-6188/6149/7183
FAX: +44-161-275-6236/6204
Room: 2.88a, Kilburn Building
email: rector@cs.man.ac.uk
web: www.cs.man.ac.uk/mig
        www.opengalen.org
        www.clinical-escience.org