Re: Implementations of LCS for OWL from Phillip Lord on 2010-05-04 (public-owl-dev@w3.org from April to June 2010)

From: Phillip Lord <phillip.lord@newcastle.ac.uk>
Date: Tue, 04 May 2010 15:23:36 +0100
To: Chris Mungall <cjm@berkeleybop.org>
Cc: Alan Rector <rector@cs.man.ac.uk>, Owl Dev <public-owl-dev@w3.org>, sonic@tcs.inf.tu-dresden.de
Message-ID: <876333pwyf.fsf@newcastle.ac.uk>

Chris Mungall <cjm@berkeleybop.org> writes:
>> It seems reasonable to me to assume that at the time you want to
>> calculate a semantic similarity, then you have all the three terms that
>> you want -- the two that you wish to compare, and the (unknown,
>> explicitly expressed in the ontology) term that is the LCS.
>
> With some knowledge bases that is a reasonable assumption; in other cases
> there may be a limited amount of pre-composition or the pre- 
> composition may be fairly ad-hoc, and allowing class expressions in the LCS
> results will give you something more specific and informative.

Yes, but you don't need the results to be a class expression in this
case. You just need the queries to support class expressions, which is a
different kettle of fish. This means that you can avoid the nastiness of
"I want LCS to support class expressions except for the ones that I
don't want like A or B". 

>
>> I can see a very strong use case why you might want to allow the query
>> terms to not pre-exist, but why the LCS? What semantic similarity
>> measures were you thinking of anyway? The information content based
>> ones will, I think, require that the LCS pre-exist anyway.
>
> I don't think that need be the case. Calculating the IC requires finding the
> cardinality of the extent of the LCS, and this can be done  trivially using
> any OWL reasoner. Of course, there is a closed world  assumption here but this
> is built into any IC calculation (the well  known literature bias).

Trivial but slow, as far as I can see. If your corpus is large, then
have to query against all members of the corpus (ie the instances). In
this case, it's worse. The "least" in LCS is defined by the corpus. So,
if there are a number of different LCSs which are sibs, then you have to
test the information content of them all. 

Phil

Received on Tuesday, 4 May 2010 14:24:20 UTC