- From: Timothy Redmond <tredmond@stanford.edu>
- Date: Tue, 13 Jan 2009 16:43:06 -0800
- To: public-owl-comments@w3.org
I know that we are very late in the OWL 2.0 specification process and that this issue has been heavily debated. But I have been troubled by the current specification of the behavior of imports and decided I would post a message regardless of its lateness. Thanks to the OWL working group for its amazing accomplishments! -Timothy ------------------------------------------------------------------------------------------------ I have been struggling for some time now trying to understand the consequences of the import by location scheme that is being suggested for owl 2.0. I will first describe what I see as the problem and then suggest a possible solution. ========= The Problem with Import by Location ========= It seems like the suggested scheme will have a negative impact on users that want to share ontologies. I foresee situations arising where the user has all the ontologies from the import closure but cannot determine which ontology imports which. This can occur when a user receives some ontologies but does not have access to the IO scheme indicated by the import statements. Some examples of this problem include: * the user receiving an ontology is on the other side of a firewall from the the user creating the ontology. * the user creating the ontology accesses his ontologies through a web container (tomcat) or through some agent based society. We often see ontology names and imports of the form "http://localhost:8080/...". * the user creating the ontology does not immediately address the issue of publishing his ontology and therefore uses import statements like import file://C:/dev/ontologies/foo.owl. * the user receiving the ontology is offline. He knows that he has all the ontologies for the import closure but the IRI used in the imports statement has no resemblance to any of the names or contents of the imported ontologies. Sometimes this is compounded when the are different import declarations used in different parts of the imports graph for the same ontology. (I have only seen this last case once but the ontology was supplied by a well-known ontology expert.) I see many ontologies (perhaps more of the troublesome ones) and all of these cases crop up quite frequently. Some of these come from well-known experts in the field. The above problems are compounded by the fact that - unless we make some recommendation - different tools will create incompatible mechanisms for redirecting import declarations. In particular, if we are downloading an imports closure off the web, it would be very easy during the download process to record which IO addresses represented in the import statements map to which files that are found on disk. This can then be used by an ontology tool to redirect the import directives when the user goes offline. However the format of the file that specifies the redirections will be different for the OWL API (Protege 4) than it will be for Jena (TopBraid). For many ontologies perhaps, things should work fine because the ontologies will essentially be imported by name. It is recommended in the owl 2.0 specifications that an ontology can be accessed by their name "If O contains an ontology IRI OI but no version IRI, then the ontology document of O should be accessible from the IRI OI." and "If D contains an ontology IRI OI and a version IRI VI, then the ontology document of O should be accessible from the IRI VI; furthermore, if O is the current version of the ontology series with the IRI OI, then the ontology document of O should also be accessible from the IRI OI." So I would expect that it will remain pretty common for the name used in an import directive to match the name of an ontology or the ontology version. This can - in theory - allow tools to determine what ontology imports which (modulo some versioning issues) even when the IO operation implied by the import statement is unavailable. However the owl 2.0 specifications make no recommendations for import by name. In fact, the import by name scheme is incompatible with the owl 2.0 specifications in the not uncommon case where the name and version of the ontology have nothing to do with the ontology location. How should users share ontologies in the full generality of import by location scheme? If no change is made to the owl 2.0 specifications then it would seem like tool builders would need some type of advice on how to handle this issue and what to say to users who are having trouble with imports. There are many possible approaches: * tool specific repository mechanisms, * automatically rewriting import declarations when ontologies are moved, * suggesting that the import scheme conform to import by name, or * not supporting full offline mode - always going to the web to determine the ontology name and version. All of these approaches have serious problems and it is not clear what would be a recommended approach. Perhaps they are all options with different domains of applicability. ========= A possible solution? ========= I am wondering if it would make sense to introduce an analogue of the xml-base for owl ontologies. In the RDF and XML renderings this could correspond to the xml base in the rendered version of the ontology. We could then add a "should" requirement saying that the imported ontology should have an owl base that is equal to the name in the import declaration. It could be explained here that the purpose of this requirement is to support sharing of ontologies and offline editing of ontologies. One problem with this suggestion is that when the ontology has a version and it is the latest version, it might very well have two alternative schemes for being imported. Perhaps this could be solved by having more than one owl base for an owl ontology document. This would break the mapping the the XML base in the RDF and XML renderings but perhaps this is acceptable. One advantage of this scheme is that it actually corresponds with what at least two tools (Protege 3 and Protege 4) are doing right now. The problem with looking up the ontology names of ontologies in an ontology repository is that it is inefficient. If an ontology rendered in RDF/XML does not have an ontology name then the tool has to parse the entire ontology in order to determine this fact. In addition, there are several OWL ontologies that contain more than one ontology declaration. So it can be ambiguous which one corresponds to the name of the ontology. For this reason both Protege 3 and Protege 4 look up the xml base and match that with the imports declaration. This is not really in line with the OWL 1.0 specification and there has been some comment about this issue in our mailing lists. But it continues to be the most pragmatic solution and it usually works. ========= Conclusion ========= Regardless of what is planned, it seems like we need some sort of discussion of how to deal with unresolved imports. Should we simply always pass the problem back to the user? The OWL 2.0 scheme is very robust when all ontologies are on the web and internet access is both reliable and trusted. It becomes much less robust when ontologies are stored on disk, or provided by web containers or agent societies. The owl api will probably be one of the first infrastructure apis that will have to wrestle with the problem of whether, when and how it can support ontology repositories and offline editing. It would be unfortunate if each tool chose a different incompatible set of mechanisms.
Received on Wednesday, 14 January 2009 00:43:47 UTC