Re: How to find a proper ontology/onto repositories from Michael F Uschold on 2010-11-01 (semantic-web@w3.org from November 2010)

From: Michael F Uschold <uschold@gmail.com>
Date: Mon, 1 Nov 2010 15:42:25 -0700
To: Alexander Garcia Castro <alexgarciac@gmail.com>
Cc: Lee Feigenbaum <lee@thefigtrees.net>, Enrico Motta <e.motta@open.ac.uk>, Juriy Katkov <katkov.juriy@gmail.com>, Semantic Web <semantic-web@w3.org>, Simon Robe <simonr@semanticarts.com>, "Dave McComb, Semantic Universe" <info@semanticuniverse.com>
Message-ID: <AANLkTimfBC_6=SHWgTBq5m1YHYRp=PrDOyU28hgM+1nN@mail.gmail.com>
Reusing an individual property, such as hasOwner is not as simple as it
sounds.
Let's say we live in an ideal world and it appears there is a hasOwner in an
ontology sitting  in a [as yet non-existant] ontology repository that is
filled only with robust, well tested and experience-hardened ontologies
(analogous to code libraries).

Even then it is not so simple.  Options include:

   1. Merely refer to the URI in the ontology repository.
   2. Import the whole ontology.

1. means you don't benefit much from all the hard work put in by the
ontology developers, because none of the axioms characterizing the meaning
are available for use.

2 is fine if the ontology (including all the chain of imports) is small.
More typically you may be importing any number of ontologies, large and
small, upper, middle and possibly lower.  Now you get a lot of junk that you
have no use or interest in, clogging up the inference engine.

What you really want is to extract out just those axioms that impact on the
one or more terms you want. You want to do so in a way that guarantees that
all and only the appropriate inferences are drawn.  Ontology engineers
building robust ontologies will add more and more axioms to rule out
nonsense models. If you import only a subset of the axioms, you will allow
some of these nonsense models. OWL's open world semantics makes it very
challenging to close out unwanted models.

This has been recognized by some researchers, and there has been some
progress.  Work done a few years ago at Manchester enabled the computation
of subsets of ontologies that are guarenteed to produce the same inferences.
This can be used to increase inference efficiency. It could also be used to
extract partial ontologies from an ontology in a repository.  I am not aware
or any robust approaches, nor whether any have been added as plug-ins to
ontology editors.

I'm afraid that for the foreseeable future, people building ontologies for
business purposes to deliver software solutions will not find it cost
effective to spend a lot of effort trying to reuse. The tools and technology
are not there yet. There is no clear payoff to justify the expense. PLEASE
CORRECT ME IF I AM WRONG!

Students and academics building new and better ontologies should be more
motivated to reuse ontologies. Their goals and success metrics are
different.

An exception would be if the goal were to build a middle-level ontology that
would be used like an upper ontology releative to many more specific
ontology applications.  In this case, it makes a lot more sense to take the
trouble to try and reuse as much as possible. Even then, the effort will be
great, and the benefits may be hard to realize and measure.

Michael



> On Mon, Nov 1, 2010 at 8:05 AM, Lee Feigenbaum <lee@thefigtrees.net>wrote:
>
>> On 10/31/2010 7:19 PM, Enrico Motta wrote:
>>
>>> At 23:22 -0400 30/10/10, Lee Feigenbaum wrote:
>>>
>>>> On 10/30/2010 10:40 AM, Juriy Katkov wrote:
>>>>
>>>>> Hello everyone!
>>>>> I have 2 questions about rdf data.
>>>>>
>>>>> 1. Suppose I started describing something in triples and I want to use
>>>>> a
>>>>> property 'hasOwner'. I understand that it's much better to use this
>>>>> property from one of the existing ontologies rather than use property
>>>>> from my own namespace.
>>>>> The question is: what is the easyest and the most right way to search
>>>>> for this property? I know, there is Swoogle and sometimes it helps me
>>>>> with that. I wonder if there is something better that fulltext search.
>>>>>
>>>>
>>>> There've been some great suggestions on this thread, but allow me to
>>>> offer the viewpoint that in many cases trying to find a predicate to
>>>> reuse is not worth the effort.
>>>>
>>>> The main goal of reuse is to allow your data to be consumed by
>>>> software tools that already know how to interpret an existing
>>>> vocabulary. If that's the case for your domain then great, it makes a
>>>> lot of sense to reuse the predicate. If that's not the case, or if you
>>>> don't know if it's the case and you find an arbitrary predicate that
>>>> seems to convey the meaning you're looking for, then I don't think
>>>> there's much point in reusing vocabulary. I'd rather save the time
>>>> searching, mint my own property, and get on with whatever I'm working
>>>> on.
>>>>
>>>> Down the road if I see (or am told of) an application consuming
>>>> similar SW data using a different predicate, I can always update my
>>>> data then and still reap the benefits of reuse. Updating my data could
>>>> be as simple as adding rdfs:subPropertyOf or owl:equivalentProperty
>>>> relations, or--if in a reasonerless world--using a straightforward
>>>> SPARQL Update statement to augment your data.
>>>>
>>>> Reuse is great but, like code optimizations, it's often not necessary
>>>> upfront. It can be added later on once the real value of the reuse is
>>>> understood. And if you never see the value of reuse, then your data
>>>> and/or applications can flourish with the predicate that you minted
>>>> for yourself, and you saved yourself the time otherwise spent
>>>> searching in the first place.
>>>>
>>>
>>>
>>> Uhm...this is certainly true for the simple scenarios, but certainly not
>>> in general. Of course, if I just have data about people and dogs and I
>>> simply want to link them with a property 'hasOwner', it is unlikely I am
>>> going to lose much by defining my own property and then worrying later
>>> about interoperability with other repositories. But if your model is a
>>> bit more complex and you have to handle any of the hundreds of modelling
>>> issues which people have been researching for the past 30 years (e.g.,
>>> agency, roles, meta-properties, time, space, part-of, etc. etc..), then
>>> it may be a good idea to dig out existing modelling solutions rather
>>> than trying to come up with your own solution, which will take far more
>>> time and will likely be sub-optimal.
>>>
>>
>> Sure - but these are two qualitatively different questions, and it seemed
>> to me that the OP was asking about simple vocabulary reuse.
>>
>> Just as it's rarely a good idea to write software libraries from scratch
>> rather than reuse existing, tried and true code libraries, complex models
>> should also be sought out and reused. I don't think the ontology search
>> engines are a great way to go about that though. I'd almost never recommend
>> that someone perform a "foobar filetype:java" google search to find a Java
>> library to reuse dealing with foobar, as I'm likely to have a great deal of
>> difficulty telling the wheat from the chaff. If I'm looking for a solution
>> for a complex modeling challenge, using an arbitrary ontology that matches a
>> search term like "role" or "event" or "units" or what-not seems as likely to
>> be a bad idea as rolling my own. (Perhaps even a worse idea, because my
>> sub-par home-rolled solution is more likely to at least address my immediate
>> use cases in a reasonable fashion.)
>>
>> That said, sites like http://ontologydesignpatterns.org/ are a tremendous
>> value for this sort of search, and it'd be great if we had more actively
>> evolving resources of this sort.
>>
>>
>>  You can do this by browsing repositories such as
>>> http://ontologydesignpatterns.org/ or, as folks have already pointed
>>> out, by using any of the various ontology search engines, such as
>>> swoogle, falcon, sindice, watson, etc.. And because at least some of
>>> these are integrated with ontology editors (e.g., there is a
>>> watson-based plugin for the neon toolkit - see
>>> http://neon-toolkit.org/wiki/Watson_for_Knowledge_Reuse), you can very
>>> quickly search for relevant properties (or classes or individuals) and
>>> then quickly add any useful results from your search to the ontology you
>>> are developing.
>>>
>>
>> Right, but there's an awful lot of subtlty and effort hidden in that word
>> "useful" in that last sentence -- it's often near impossible to tell which
>> results are useful and which are not!
>>
>> Lee
>>
>>
>>  Enrico
>>>
>>>
>>>
>>>  Lee
>>>>
>>>>
>>>>> 2. Suppose I face the dataset I never use before. What do you usually
>>>>> do
>>>>> first to get a first impression about the dataset? At the moment I
>>>>> first
>>>>> make some SPARQL queries to this dataset, such as:
>>>>> select COUNT(?x) WHERE
>>>>> {
>>>>> ?x a ?z .
>>>>> }
>>>>>
>>>>> than I use Marbles or Sig.ma to surf randomly over this data and
>>>>> finally
>>>>> I come up with a opinion where I need data from the dataset or not.
>>>>> Again, what do you usually do? Is there a tools or useful queries that
>>>>> can help Semantic Web user in browsing data and getting useful info
>>>>> about datasets?
>>>>>
>>>>> Thank you in advance!
>>>>>
>>>>> Yury Katkov
>>>>>
>>>>
>>>>
>>>> --
>>>> The Open University is incorporated by Royal Charter (RC 000391), an
>>>> exempt charity in England & Wales and a charity registered in Scotland
>>>> (SC 038302).
>>>>
>>>
>>>
>>>
>>
>
>
> --
> Alexander Garcia
> http://www.alexandergarcia.name/
> http://www.usefilm.com/photographer/75943.html
> http://www.linkedin.com/in/alexgarciac
> Postal address:
> Alexander Garcia, Tel.: +49 421 218 64211
> Universität Bremen
> Enrique-Schmidt-Str. 5
> D-28359 Bremen
>



-- 
Michael Uschold, PhD
   LinkedIn: http://tr.im/limfu
   Skype: UscholdM
Received on Monday, 1 November 2010 22:42:59 UTC