Soliciting help in debugging reasoning performance problems from Alan Ruttenberg on 2009-08-13 (public-owl-dev@w3.org from July to September 2009)

From: Alan Ruttenberg <alanruttenberg@gmail.com>
Date: Thu, 13 Aug 2009 09:28:04 -0400
To: public-owl-dev@w3.org
Cc: Dmitry Tsarkov <tsarkov@cs.man.ac.uk>, Boris Motik <boris.motik@comlab.ox.ac.uk>, Evren Sirin <evren@clarkparsia.com>, Timothy Redmond <tredmond@stanford.edu>, Mark Musen <musen@stanford.edu>, Matthew Horridge <matthew.horridge@cs.man.ac.uk>, OBI Developers <obi-devel@lists.sourceforge.net>
Message-ID: <29af5e2d0908130628t5aa4d1eev60654e1ff48a455b@mail.gmail.com>

Hi,

I work on OBI, and ontology under development for representing
Biomedical Investigations (http://obi-ontology.org/)

For the last few months we have been having trouble doing consistency
and classification on our ontology. The symptoms are very non
deterministic reasoning times - for a given version the task can
complete in as little as 40 seconds in one run, or not complete
overnight on a subsequent run on the same ontology. This situation has
seriously hampered development.

The environment in which we've tried reasoning include protege 3
(pellet 1.5.2) protege 4 (pellet 1.5, pellet 2, fact++ - hermit in
pellet 4 throws an exception), and pellet 1.5.(3 I think) in LSW.

At various times I have cleaned up certain under specification in OBI
by reviewing models being created based on some hooks into pellet (
http://docs.google.com/View?id=dzprnmw_71chch6pfk). However at the
moment the bigger problem seems to be the non-deterministic nature of
reasoning times and the fact that small changes, like reserializing
the file, can make things work better or worse, unpredictably.

A current version to work with is at
http://purl.obolibrary.org/obo/obi/repository/tags/2009-08-13-owldev/

The main file is merged/obi.owl

In the merged directory there are also:

disjoints.owl - we manage disjoints outside our editor tools - these
are the disjoints generated by script
assumed-individuals.owl - a set of individuals that need exist for
some subsumptions to be recognized
inferred-superclasses.owl - subclass of statements inferred from
classification of obi.owl + disjoints.owl + assumed-individuals.owl

obi.owl does not include disjoints.owl or assumed-individuals.owl by
default. If you do experiments you might want to include the disjoints
in particular as these constrain the models.

BTW, pellet 1.5 validates obi as OWL-DL using its species checker.

Some other notes:

If one uses this one needs to be sure to only use the files included
in this directory. There is an obi.repository file that says where
they are - fetching over the net may retrieve versions that have been
updated. When using Protege 4 I add external/, external/iao, and
branches/ to the ontology libraries. (to be sure I am getting only
local files I turn off the network or check which ontologies have been
loaded).

We currently edit in Protege 3, as Protege 4 does bad things to our
annotations when it serializes. Hopefully this will be addressed in
Protege 4.1.

If a solution is found, we need to make it usable by the OBI
developers by having it work in some OWL editor - while I can run
command lines and edit RDF/XML in emacs, most of the other OBI
developers can't.

In Pellet 4, if a reasoner is going to succeed, it will be FaCT++. In
the past it has been rare that the Pellet 2 plugin ever finishes. Of
course, today, just now, on my machine, on this version, it finishes
(quickly) (sigh). However, see next note.

I have noticed that in cases where there are not adequate declarations
(necessitated by changes in the way OWL 2 handles declarations and
imports) annotation properties can be determined to be ambiguous, some
part of the chain will decide that all the classes are instances too,
and this seems to cause much worse performance. If/when this happens
in Protege 3, it isn't visible. In Protege 4 it can be seen by looking
in the individuals tab, where there will appear instances with the
names of classes. We do not intentionally pun in OBI. I believe this
issue may affect Pellet performance in particular, since, IIRC it
tries to be "clever" about doing OWL Full reasoning when it thinks it
appropriate.

Here is a sample run in LSW (pellet 1.5)

CL-USER> (time (check-nolockup-kb 'kb :obimi 300 2 t t))
Trying to reload and check :OBIMI, try #1 (failed to check consistency
and classify in 300 seconds)
consistency: 23.039 seconds real time
classification: 89.012 seconds real time

What happened? The incantation says load up obi+assumed
individuals+disjoints and try to check consistency and classify,
allocating 300 seconds before you give up. Try this twice.

In the first case it failed given 5 minutes, not even finishing
consistency checking. In the second case it succeeded at both in about
2 minutes.

Any help on this appreciated,

Regards,
-Alan

Received on Thursday, 13 August 2009 13:29:04 UTC