Natasha, All

A long email that has taken until I have been away and may cross your next document. If so, apologies.

As I try to understand the debate, there seem to be at least four distinct use cases. If these don't fit, please correct me. But the only way I can address the problem is with more detailed examples and then applying each use case to the various examples.

A set of examples and text follows, then an example ontology for Case III ('The OWL-DL case) which is built in the ProtegeOWL-CO-ODE tools with visualizations from the CO-ODE OwlViz and classification by Racer. There are serious limitations in all of the available tools, so the demo is not what I would ideally like, but I still think it important to have real things in real software to keep our feet on the ground and be clear what we are (or at least I am) trying to achieve. (It also provides interesting insights into how inclusion ought to work, which is a little explored topic)

i)    A book about Lions - habitat, habits, anatomy, etc.
ii)   "Born Free" a book about a particular lion
iii)    A book about the discovery and classification of lions and their recognition of separate subspecies of Lions. (I don't know if plausible or not for 'lions', but plausible for other biological examples)
iv) A reference is a cladistics tool (a "clef") to "Lion" specifying exactly how lions are differentiated (phenotypically) from other felines.
v) A reference to the "Lion Genome" - (plausible for Dog, Cat, etc. and no doubt eventually for Lion)
vi) A reference in a resource on "species described by Linneaus" to "Lion" as a species.
vii) That "John" is credited as the author of the class Lion in a KB (As a statement about John).

I think the above indicates that of all the problems we could have started for on best practice, "Classes as Values" is the perhaps most difficult and problematic. I'll put down some thoughts and add some real examples, but then I'll turn my efforts, at least temporarily, to other matters.

I would be interested in the above list in others' take on which and for what purposes the above - or any extension of the list - need to be distinguished? What is gained if we do? What is lost if we don't?

My take follows

"about the class lion": vii - NB we don't have a distinction between 'class' and 'concept represented by the class' available to us. The author of the class and the person who first described the species are clearly different.

"about the concept lion" best: i), iii), iv), v), vi) ; acceptable but with some loss ii)

"about some lions:" best ii); acceptable with little practical loss i), v) ; a kluge but workable in most cases vi)

I note that having separate notions of 'species' separate from 'concept' and 'class' would help. But that takes us dangerously near to discussing details of ontology.

SOME SPECIFIC CASES AND QUESTIONS

I) Simple Reference:

Developer A wants to mark up the resources that they are putting up - be they books, images, movie clips, general web pages, whatever - in such a way that they can be found in the "usual" way by an RDF engine such as Tim Berners Lee's package, etc.. Developer B wants to build an application that finds all such web sources. Both use - amongst other things - the ontology developed by C in OWL (of whatever flavour) - as a source for the vocabulary for markup. Neither A nor B are interested in the added facilities provided by OWL, but only want to ensure that their mark-up/searches for whatever vocabulary they find explicitly represented in the ontology developed by C (and any of a number of other ontologies which they use) are such that search's for things about "Lions" works for all the sub and supertypes of "Lion" as expected - e.g. "African Lion", "Wild Animal", "Animal of the African Veldt", "Predator", "Carnivore" etc.

However, they are not interested in defining any of these subsidiary terms, merely in using them. As I understand it they will, potentially, search each ontology/vocabulary for the classes/terms it describes and their subclass/superclass relations, and prepare a list of potential search list based on some combination of these concepts/terms. They are not interested in the further information that these ontologies may contain about Lions, except in the limited sense of super and subclasses. What they want to do is find other material marked up with relevant references.

They are in no sense attempting to 'import' any of these ontologies, merely to reference them. Their mark up is in RDF(S).

In this case it seems clear that it is irrelevant to either A or B whether the vocabulary developed by C is in OWL (in any of its flavours) or RDF(S). All that matters is that they can follow type/subtype links and expand their search strategy for all resources marked with rdf:subject referring to any of the expanded list of subject resources obtained by following type/subtype links.

The great effort to make RDF and OWL compatible should make this happen naturally, regardless of the flavour of OWL in which the ontology is expressed. Since neither A nor B is using OWL, whether it falls into OWL full is irrelevant.

The one thing which follows, almost inevitably from current tools although not from the standard, is that the semantics of any negated queries (there is no explicit negation in RDF) will be closed world - i.e. a search for resources about "Lions but not Tigers" would produce all references which were marked with "Lion" or one of its subtypes and not marked with "Tiger" or one of its subtypes. There may be some issue about this in the current RDF Query Language. I'd be grateful if somebody could clarify.

The second thing that follows is that the vocabularies will be treated as pre-coordinated - i.e. only concepts and relations explicitly present will be traced, not those potentially implied. Although most ontologies are likely to be "delivered" as pre-coordinated in the immediate future, they may well be exchanged between developers in an unclassified form which will only give correct answers if first classified or used with software capable of post-coordination. Therefore, as I argue elsewhere, we need a standard bit of metadata for the OWL Ontology header that says whether a gives the 'classification status' of each ontology. Only 'classified' or 'pre-coordinated' ontologies are safe to use for simple reference.

As far as I can see, most people working in this mode would be happy to blur the distinctions in i)-vi). vii) (metadata) remains an issue but in this perversely inverse form probably doesn't arise very often. Most metadata is about the thing annotated not the annotator.

II) Partitioned representation in OWL:

The same as above, except that Author A is working in OWL but wants to use an ontology in OWL (developed by C) as s a simple vocabulary item. As above. B wants to find the references. C's ontology is completely classified and pre-coordinated, so neither A nor B expects to do any reasoning with it, merely accept the subclass-superclass links explicitly present. I don't think there is any clear way of indicating this use case currently.

In this case, there seems to be a problem. If A makes a simple reference to a resource of type Class in the ontology developed by C, e.g.

restriction(rdf:subject value(LionClass))

A is immediately in OWL-Full, yet neither A nor B expect to perform any inferences using the information in C's ontology. If the rest of A's ontology conforms to the restrictions on OWL-DL, it seems perverse not to apply OWL:DL semantics and reasoners.

More fundamentally, B has no access to the subclasses of LionClass to expand his query unless we consider some mechanism of "reflection" of owl:subclassOf axioms in A's ontology visible in B's ontology as rdfs:subtypep-of restrictions. There is no provision in the standard for this as I understand it.

If used in this way there is no way to distinguish i)-vii) above. If all we need is a bibliographic search, this may not matter. For other applications, it may matter.

If anyone can explain to me how better to cope with this case, please do.
If anyone has concrete examples of this case, please mention them.

I think this is how people will use things like the Gene Ontology and NCI ontologies in many cases.

In any case, should B perform any inference according to OWL semantics, the reasoning would be open world.

III) Importing an ontology

As above, except that A and a are developing their ontology and application using OWL-DL and wish to use all of the information in C's ontology and import it into their own. A is probably developing a knowledge source about some related topics and wants to include the information about Lions, and probably other things, developed by C. B is developing an agent or application to use It really only makes sense if B includes all, or at least a well defined module of,

At this point the distinctions matter. There is a great advantage to marking things up as "about some Lions" rather than "About the concept Lion" and in providing a proxy mechanism for dealing with 'species' because these things will conform to OWL-DL and can be classified using existing reasoners.

I sketch a version which works up to the limits of current tooling along with example ontologies below.

In this case, the author is trying to use the information to make inferences. The cost is a more complex representation and different approximations. I would argue that for this purpose:

i), ii), v) and possibly iv) are reasonably approximated by "about some lions". The distinction between being about all of a group, or a subgroup of lions, and being about the concept lion, can be blurred in i) and v) since all we can really describe are some lions. The Cladistics case might be handled this way but is probably better treated as with lion_species in vi) below.

If we can live with that approximation, then saying that a specific resource is of rdf:type

(ResourceTypeXXX and restriction(xxx:refers_to someValuesFrom Lion))

Makes reasoning easy and straight forward, at least using Racer. (Using FaCT requires some further approximations which we needn't go into here.) There is a discussion to be had about real property should be used for "xxx:refers_to". I have deliberately used a distinct name to leave that question open.

vi) and perhaps iv) are clearly about "The species Lion" rather than about "some Lions". They require work arounds. Here there is room for real argument about which way the implication goes. Do we say:
    a) All Lions are of species LionSpecies
    b) All things that are of species LionSpecies are Lions
    c) Both

This assumes that 'species' is a reified thing that is different from the class 'Lion' representing the concept 'Lion'. Without delving into ontology and epistemology, this is at least a reasonable position. We have a concept of 'Lion' long before we know anything about 'species'.

IV) Metadata

Case vii - clearly an annotation and clearly belongs as an annotation property, except that it is the 'inverse' and is about John. I think we could live with this as just an annotation property for the time being. And I think it will arise rarely enough not to worry about - but somebody else may think differently.

It would be nice if we had a convention to separate:
annotations that represent metadata about a the representation (class)
annotations that are workarounds for things we can't express in without falling into OWL-Full.

SUMMARY:

Case I: Simple reference to a preclassified ontology for labelling and searching the semantic web. Natasha's case 1 seems to work. A standard metadata element is needed for OWL ontologies indicate that all inferences have already been made so that the explicit structure can be depended on. Care may be needed over open/closed world assumptions when querying.

Case II: Use in one OWL ontology of the simple vocabulary from another OWL ontology. Troublesome. As it stands, because it throws one into OWL-Full over 'trivia', developers wanting to use classifiers gives up a great deal to use a standard vocabulary - exactly the reverse of what we would want to promote. I would solve the problem by reasoning in each of the two KBs separately, but I don't think the standard supports this option. I can't think for the moment of another.

Case III. Inclusion, rather than reference to, of one OWL DL ontology by the annotating expression.
Works OK in OWL-DL with minor work-arounds for individuals but requires a property, here labelled xxx:refers_to and proxy individuals such as 'species'. The 'includes' mechanisms need to be improved to make this easy.

A detailed annotated example of how this approach works is attached. As of this moment is best viewed in OilEd (oiled.man.ac.uk) rather than Protege-OWL and must be classified using Racer. (Hopefully, the critical omission in Protege-OWL will be fixed RSN.) I have used separate name spaces but not attempted to use 'includes' as the tools have had problems in the past and time is short.

Case IV: Metadata - i.e. data about the representation - use annotation properties.

I hope this helps

Regards

Alan

--
Alan L Rector
Professor of Medical Informatics
Department of Computer Science
University of Manchester
Manchester M13 9PL, UK
TEL: +44-161-275-6188/6149/7183
FAX: +44-161-275-6236/6204
Room: 2.88a, Kilburn Building
email: rector@cs.man.ac.uk
web: www.cs.man.ac.uk/mig
www.opengalen.org
www.clinical-escience.org
==================================================================

Example Ontology for Case III - importing ontologies working entirely in OWL-DL:

The attached example ontology is not yet finished but illustrates a number of these points. It should work in either Protege-OWL or OilEd, but was developed in the latest beta version Protege-OWL with the CO-ODE add ins (downloadable from protege.stanford.edu-->plugins-->backends-->owl; www.co-ode.org)

(The tooling for individuals is still rudimentary, so more nastiness shows through than should. Tricks which should be confined to the classifier level appear in use the use of leaf classes to simulate instances and of someValuesFrom rather than value(). A complete discussion of this issue is beyond the scope of this note. )

The Scenario:

There are three reference ontologies
    c1_ about Animals
    c2_ about Media
    c3_ about History of Science

There is an annotation ontology of media
a1_ - that annotates a series of images, films, books etc. as "referring to" various entities in c1_... c3_...
(I have deliberately used a separate property to make it easier to discuss how this property relates, if at all, to dc:subject. Please attach no special meaning to the label "refers_to".)

There is an application ontology
b1_ which includes inferred classes composed from a1_ and c1_...c3_...

To avoid clutter, I assume a common top ontology

Species are treated as individuals linked to all members of their species by the property of_species/species_of

Because namespaces and individuals are neither handled well by either one or both tools, I have simulated namespaces with prefixes c1_, a1_ etc.

I have simulated individuals as terminal classes. Without better tooling the 'maximum' simulation is tedious. I have left some restrictions out to avoid clutter and save time. Please hold debates on this issue for another time.

The understood import dependency tree

b1 - the application ontology
   a1 - the annotation ontology
      c1 c2 c3 - the reference ontologies about animals, media, and history of science
          top - the top ontology

i.e. the application ontology imports the annotation ontology which has imported the domain ontologies which in turn are based on the top ontology

Task

The application using b1_ is interested in
    Images about Lions
    Books about Lions
    Books about predators
    Media about Animals
    Films about things first described by Linnaeus

The Annotations

a1_ contains 4 individuals
    a1_a_book_about_lions    - refers_to someValuesFrom c1_Lion   [1]
    a1__Born_Free                - refers_to someValuesFrom b1_Elsie
    a1__Elsie                          -type c1_Lion (simulated by subclass_
    a1_a_film_about_lions       -refers_to someValuesfrom c1_Lion

Each species in c1_ is equivalent to restriction(of_species someValuesFrom appropriate_species). Again the simulation of individuals is imperfect. But the effect works. Everything of the species is of the corresponding class and vice versa.

c3_ contains the pseudo_individual "c3__Linnaeus" and the information that he first described the species Lion. (Not lions, but the "c3_lion_species", an individual species)

Given all this,

b1_Images_about_lions ==
class(Image complete and
restriction(top_refers_to someValuesFrom Lion))
Paraphrase: "Images about some lions"

the other references follow.

The OwlViz visualisation of the hierarchies are shown below for images and animals respectively.
The complete ontologies are in the attached zip file
Classification is with Racer.

Figure 1a: Fully classified hierarchy of media showing organisation by topic. See text for naming conventions. NB all multiple hierarchies are the result of inference.

Figure 2: Fully classified hierarchy of Living Things. NB all multiple hierarchies are the result of inference.

The effect of classification can be seen from the contrast with the asserted pre-classified hierarchies below
which none of the media are classified by subject type.

Figure 1b: Media hierarchy preclassification - not yet organised by topic

Figure 2b: Animal Hierarchy preclassification - not yet organised into multi-hierarchy

[1]would better be oneOf(c1_Lion) but this doesn't work in classifier.