comments on imports

I know that we are very late in the OWL 2.0 specification process and  
that this issue has been heavily debated.  But I have been troubled by  
the current specification of the behavior of imports and decided I  
would post a message regardless of its lateness.

Thanks to the OWL working group for its amazing accomplishments!

-Timothy

------------------------------------------------------------------------------------------------

I have been struggling for some time now trying to understand the
consequences of the import by location scheme that is being suggested
for owl 2.0.  I will first describe what I see as the problem and then
suggest a possible solution.

  ========= The Problem with Import by Location =========

It seems like the suggested scheme will have a negative impact on
users that want to share ontologies.  I foresee situations arising
where the user has all the ontologies from the import closure
but cannot determine which ontology imports which.  This can occur
when a user receives some ontologies but does not have access
to the IO scheme indicated by the import statements.  Some examples
of this problem include:

  * the user receiving an ontology is on the other side of a firewall
    from the the user creating the ontology.

  * the user creating the ontology accesses his ontologies through a
    web container (tomcat) or through some agent based society.  We
    often see  ontology names and imports of the
    form "http://localhost:8080/...".

  * the user creating the ontology does not immediately address the
    issue of publishing his ontology and therefore uses import
    statements like

        import file://C:/dev/ontologies/foo.owl.

  * the user receiving the ontology is offline.  He knows that he has  
all the
    ontologies for the import closure but the IRI used in the
    imports statement has no resemblance to any of the names or
    contents of the imported ontologies.  Sometimes this is compounded
    when the are different import declarations used in different parts  
of the
    imports graph for the same ontology.  (I have only seen this last
    case once but the ontology was supplied by a well-known ontology  
expert.)

I see many ontologies (perhaps more of the troublesome ones) and all
of these cases crop up quite frequently.  Some of these come from
well-known experts in the field.

The above problems are compounded by the fact that - unless we make
some recommendation - different tools will create incompatible
mechanisms for redirecting import declarations.  In particular, if we
are downloading an imports closure off the web, it would be very easy
during the download process to record which IO addresses represented
in the import statements map to which files that are found on disk.
This can then be used by an ontology tool to redirect the import
directives when the user goes offline.  However the format of the file
that specifies the redirections will be different for the OWL API
(Protege 4) than it will be for Jena (TopBraid).

For many ontologies perhaps, things should work fine because the
ontologies will essentially be imported by name.  It is recommended in
the owl 2.0 specifications that an ontology can be accessed by their
name

    "If O contains an ontology IRI OI but no version IRI, then the
     ontology document of O should be accessible from the IRI OI."

and

    "If D contains an ontology IRI OI and a version IRI VI, then the
     ontology document of O should be accessible from the IRI VI;
     furthermore, if O is the current version of the ontology series
     with the IRI OI, then the ontology document of O should also be
     accessible from the IRI OI."

So I would expect that it will remain pretty common for the name used
in an import directive to match the name of an ontology or the
ontology version.  This can - in theory - allow tools to determine
what ontology imports which (modulo some versioning issues) even when
the IO operation implied by the import statement is unavailable.

However the owl 2.0 specifications make no recommendations for import
by name.  In fact, the import by name scheme is incompatible with the
owl 2.0 specifications in the not uncommon case where the name and
version of the ontology have nothing to do with the ontology location.
How should users share ontologies in the full generality of import by
location scheme?  If no change is made to the owl 2.0 specifications
then it would seem like tool builders would need some type of advice
on how to handle this issue and what to say to users who are having
trouble with imports.  There are many possible approaches:

  * tool specific repository mechanisms,

  * automatically rewriting import declarations when ontologies are  
moved,

  * suggesting that the import scheme conform to import by name, or

  * not supporting full offline mode - always going to the web to
    determine the ontology name and version.

All of these approaches have serious problems and it is not clear what
would be a recommended approach.  Perhaps they are all options with
different domains of applicability.

  ========= A possible solution? =========

I am wondering if it would make sense to introduce an analogue of the
xml-base for owl ontologies.  In the RDF and XML renderings this could
correspond to the xml base in the rendered version of the ontology.
We could then add a "should" requirement saying that the imported
ontology should have an owl base that is equal to the name in the
import declaration.  It could be explained here that the purpose of
this requirement is to support sharing of ontologies and offline
editing of ontologies.

One problem with this suggestion is that when the ontology has a
version and it is the latest version, it might very well have two
alternative schemes for being imported.  Perhaps this could be solved
by having more than one owl base for an owl ontology document.  This
would break the mapping the the XML base in the RDF and XML renderings
but perhaps this is acceptable.

One advantage of this scheme is that it actually corresponds with what
at least two tools (Protege 3 and Protege 4) are doing right now.  The
problem with looking up the ontology names of ontologies in an
ontology repository is that it is inefficient.  If an ontology
rendered in RDF/XML does not have an ontology name then the tool has
to parse the entire ontology in order to determine this fact.  In
addition, there are several OWL ontologies that contain more than one
ontology declaration. So it can be ambiguous which one corresponds to
the name of the ontology.

For this reason both Protege 3 and Protege 4 look up the xml base and
match that with the imports declaration.  This is not really in line
with the OWL 1.0 specification and there has been some comment about
this issue in our mailing lists.  But it continues to be the most
pragmatic solution and it usually works.

  ========= Conclusion =========

Regardless of what is planned, it seems like we need some sort of
discussion of how to deal with unresolved imports. Should we simply
always pass the problem back to the user?  The OWL 2.0 scheme is very
robust when all ontologies are on the web and internet access is both
reliable and trusted.  It becomes much less robust when ontologies are
stored on disk, or provided by web containers or agent societies.  The
owl api will probably be one of the first infrastructure apis that
will have to wrestle with the problem of whether, when and how it can
support ontology repositories and offline editing.  It would be
unfortunate if each tool chose a different incompatible set of
mechanisms.

Received on Wednesday, 14 January 2009 00:43:47 UTC