Re: Ontology modules and namespaces

On Nov 8, 2009, at 7:03 PM, Alan Ruttenberg wrote:

> On 11/4/09, Holger Knublauch <holger@knublauch.com> wrote:
>> Since TopBraid Composer [1] was criticized here, please allow me
>> explain that it can very well be used in the scenario below. I will
>> let the people on this list decide whether it behaves well or not.  
>> The
>> mechanism it uses has been stable for the last three years, and I
>> think it has worked quite well so far.
>
> It does not.

Thanks for sharing *your opinion*. The original question was about  
modularizing ontologies so that resources from the same namespace can  
be organized across multiple files. Many users are confused about the  
difference between ontology URIs and namespaces, and I was addressing  
this. You seem to be switching topics now to whether base URIs are a  
valid mechanism to identify those (multiple) files or whether the URIs  
of owl:Ontologies should be used only.

>
>> If users are editing files from their hard drive, TBC will associate
>> each file with a base URI. This base URI is later used to resolve
>> owl:imports, so that the system can figure out whether it has local
>> copies of web resources without going to the web. The base URI is
>> retrieved from the files by looking into the first few lines - if  
>> it's
>> an RDF/XML file then it uses the declared xml:base,
>
> This is simply wrong and causes problems in practice. Thankfully it is
> finally being fixed in Protege.

Comparing TopBraid and Protege is like comparing apples with oranges.  
Protege 4 has been designed as a native OWL 2 tool, and it generally  
cannot correctly handle RDF files. TopBraid has been designed as a  
semantic web technology tool with a focus on RDF-based languages  
including, but not limited to, OWL. Many RDF files do not even declare  
owl:Ontologies, making your suggested solution not attractive in  
general. In practice however TopBraid makes efforts to make sure that  
base URI (written as xml:base in RDF/XML) and the owl:Ontology remain  
synchronized. It will add missing owl:Ontology triples if a file gets  
saved from the web, to maximize OWL compatibility. TopBraid also  
provides a warning if there are more than one owl:Ontologies in a  
file, and has a button to fix this scenario. For most back-ends (such  
as databases), TopBraid also checks for the owl:Ontology to learn  
about the base URI. So I don't think there are substantial practical  
differences between what you outline and what we have implemented.

BTW I just downloaded Protege to see how it handles the case of  
multiple base URIs (owl:Ontology URIs) across multiple files. With  
4.0.1, I did the following:
- Create file test.owl with URI http://example.org/test
- Add a class Person and save
- Create file test2.owl with same base URI as above
- Protege opens the old (!) file test.owl and no file (or warning)  
gets created !
- Since Protege does not allow me to create two files with the same  
URI, I
- create a file test2.owl with http://example.org/test2
- Protege opens the file
- Close Protege
- Manually edit test2.owl so that it has the same base URI
- Open test2.owl and add owl:imports to file test.owl
- Imports view now claims to import test.owl, but none of its triples  
show up

All this shows that the issue is not fixed at all in Protege, and the  
same kind of base URI/ontology confusion may arise like in TopBraid.  
IMHO it is still better to allow working with multiple files with the  
same base URI than just silently ignoring them and hoping for the best.

- I also tried to import http://rdfex.org/foaf/Person,firstName which  
TopBraid handles without problems, but Protege fails completely  
because there is no owl:Ontology declared. I guess such a strict  
interpretation of the OWL spec is not helpful if the OWL community  
wants to interoperate with RDF-based ontologies. And why should all  
RDF snippets in the world be forced to declare an extra triple only  
because some OWL tools are inflexible? OWL is based on RDF, not the  
other way around.

> The xml:base has no status whatsoever
> in OWL. owl:imports in both OWL 1 and OWL 2 are based on the ontology
> URI. The only way to determine the ontology URI is to fully parse an
> OWL file. In doing so one must recognize that certain
>
> :x rdf:type owl:Ontology
>
> triples are the result of serialization of owl:import statements and
> so their subject is not the name of the ontology. Once these are
> discounted, there should be a single triple of the above form, and
> whatever is in the place of the :x is the name of the ontology.

How is this supposed to work in practice? My humble understanding of  
the owl:imports mechanism is that it is supposed to support importing  
ontologies, in particular from the web. The URL being imported should  
therefore align with the physical location of that file, following  
best linked data practice. If they are different (like in the infamous  
case of the SWRL ontology) significant problems arise. What is the use  
case of having distinct base URIs (physical location) and  
owl:Ontologies? I can certainly see the use case of working with local  
files (and we do this all the time), but for web-based ontologies this  
looks like a very bad practice?

>
> In order that a user not pay the price of this computation I suggest
> that you cache the ontology name somewhere based on either the file
> date, md5, or some other easily computed value that can indicate that
> a file has changed.

This could be an optimization for a future version. We have decided  
against persisting those mappings and instead compute the mapping at  
start-up because persisting them might introduce yet another thing  
that gets out of date. But I can see that Protege wants to cache those  
values because reloading all files to get their owl:Ontology is  
expensive.

>
>> for N3/Turtle
>> files it uses the URI of the first owl:Ontology, or a base URI  
>> comment
>> in the head, etc. In any case, some base URI is needed to make files
>> importable.
>
> Also problematic, see above.

I don't think so, see above.

Regards,
Holger


>
> Regards,
> Alan
>
>> If multiple files have the same base URI then the system
>> allows users to pick a "primary" file to resolve conflicts. But this
>> case is rare and can be easily worked around.
>>
>> It is perfectly valid in TopBraid to split a namespace across  
>> multiple
>> files, and thus edit different snippets. As long as all snippets are
>> somehow distinguished with unique base URIs (maybe
>> http://example.org/project/snippet1
>> , snippet2 etc) then it's possible to open them in isolation or  
>> have a
>> master file that imports them all. A simple union graph export
>> (possible via SPARQLMotion) can then be used to merge the various
>> smaller files, or, in the other direction, to split an existing large
>> file into multiple snippets. TopBraid makes a clear distinction
>> between the base URI and the unrelated concepts of default namespaces
>> and other namespaces. This means that all smaller files may contain
>> instances from multiple namespaces, or the same namespace. Editing
>> them in TopBraid is no problem, as long as you are aware of how the
>> system maintains its file-to-base URI mapping. I am more than happy  
>> to
>> discuss this further, but as this might be off-topic I suggest moving
>> to the TopBraid Composer mailing list [2].
>>
>> By the way, the idea of using different base URIs (owl:imports
>> locations) for serving resources from other namespaces has been
>> implemented in the RDFex service [3], which can be used to import  
>> only
>> selected snippets from larger namespaces.
>>
>> Thanks,
>> Holger
>>
>> [1] http://www.topquadrant.com/products/TB_Composer.html
>> [2] http://groups.google.com/group/topbraid-composer-users
>> [3] http://rdfex.org
>>
>>
>> On Nov 2, 2009, at 8:48 PM, Ian Emmons wrote:
>>
>>> Some tools, such as TopBraid Composer, do not behave well when the
>>> namespace-to-file mapping is not 1-to-1.  This fact doesn't say
>>> anything about the right or wrong of your proposal, of course --
>>> only about how easy it will be in practice.
>>>
>>>
>>> On Oct 26, 2009, at 10:25 AM, Simon Reinhardt wrote:
>>>> Hi,
>>>>
>>>> It is becoming somewhat popular for large ontologies to be split
>>>> into a core ontology file and module ontology files (which import
>>>> the core). Normally each module then gets its own namespace for the
>>>> terms defined in it. I was wondering though if that is too
>>>> complicated for users of the ontologies. I have seen confusion of
>>>> "sioc" and "sioct" (the prefixes for the SIOC core and the SIOC
>>>> Types module namespaces) and when such vocabularies get higher
>>>> adoption by people not so well versed with ontologies I can see it
>>>> happen a lot more often.
>>>>
>>>> So as an alternative I want to explore the idea of just using one
>>>> namespace shared between the core and the modules. The advantage
>>>> would be not having to guess which namespace to use. One
>>>> disadvantage for the developer(s) of the ontology is that a "local
>>>> name" can only be used in one of the modules or core, you can't use
>>>> the same "word" under a different namespace with a different
>>>> meaning. Another disadvantage is that if you want the terms to
>>>> dereference to the ontology files they have been defined in then
>>>> you can only do that with a "/" namespace (and you have to set up
>>>> lots of redirects).
>>>>
>>>> My questions: What do you think of that idea? Can you see any other
>>>> advantages or disadvantages? Do you think several namespaces are
>>>> not confusing at all? And what are the main advantages to splitting
>>>> up ontologies into modules other than being easier to organise? Do
>>>> they justify a higher burden on the ontology users?
>>>>
>>>> Thanks,
>>>> Simon
>>
>>
>>

Received on Monday, 9 November 2009 19:15:48 UTC