- From: Chris Mungall <cjm@berkeleybop.org>
- Date: Tue, 4 May 2010 09:06:37 -0700
- To: Phillip Lord <phillip.lord@newcastle.ac.uk>
- Cc: Alan Rector <rector@cs.man.ac.uk>, Owl Dev <public-owl-dev@w3.org>, sonic@tcs.inf.tu-dresden.de
On May 4, 2010, at 7:23 AM, Phillip Lord wrote: > Chris Mungall <cjm@berkeleybop.org> writes: >>> It seems reasonable to me to assume that at the time you want to >>> calculate a semantic similarity, then you have all the three terms >>> that >>> you want -- the two that you wish to compare, and the (unknown, >>> explicitly expressed in the ontology) term that is the LCS. >> >> With some knowledge bases that is a reasonable assumption; in other >> cases >> there may be a limited amount of pre-composition or the pre- >> composition may be fairly ad-hoc, and allowing class expressions in >> the LCS >> results will give you something more specific and informative. > > > Yes, but you don't need the results to be a class expression in this > case. You just need the queries to support class expressions, which > is a > different kettle of fish. This means that you can avoid the > nastiness of > "I want LCS to support class expressions except for the ones that I > don't want like A or B". In some cases you do want CEs in the LCS for additional precision. Consider: car = vehicle and hasPart exactly 4 wheel and hasPart some motor motorbike = vehicle and hasPart exactly 2 wheel and hasPart some motor bicycle = vehicle and hasPart exactly 2 wheel named_lcs(car,motorbike) = vehicle lcs(car,motorbike) = vehicle and hasPart some wheel and hasPart some motor named_lcs(bicycle,motorbike) = vehicle lcs(bicycle,motorbike) = vehicle and hasPart exactly 2 wheel named_lcs(bicycle,car) = vehicle lcs(bicycle,car) = vehicle and hasPart exactly some wheel Of course, if intermediate classes such as "2 wheeled vehicle", "wheeled vehicle", "motorized vehicle" etc are declared in advance then returning named classes gives the equivalent level of precision. One strategy would be to enumerate all combinations of CEs, feed that to the reasoner, and then use standard LCS techniques. But (a) this is only feasible for certain subsets of OWL-DL and (b) this is presumably less efficient than techniques that integrate the LCS calculation with reasoning. >>> I can see a very strong use case why you might want to allow the >>> query >>> terms to not pre-exist, but why the LCS? What semantic similarity >>> measures were you thinking of anyway? The information content based >>> ones will, I think, require that the LCS pre-exist anyway. >> >> I don't think that need be the case. Calculating the IC requires >> finding the >> cardinality of the extent of the LCS, and this can be done >> trivially using >> any OWL reasoner. Of course, there is a closed world assumption >> here but this >> is built into any IC calculation (the well known literature bias). > > Trivial but slow, as far as I can see. If your corpus is large, then > have to query against all members of the corpus (ie the instances). In > this case, it's worse. The "least" in LCS is defined by the corpus. > So, > if there are a number of different LCSs which are sibs, then you > have to > test the information content of them all. If the lcs function can return a CE that uses intersection then there is a maximum of 1 LCS. If the LCS did contain n>1 CEs e1, ..., en then you could just create a CE <e1 and e2 and ... en> which is more specific. Proof from the following paper: title={{Computing least common subsumers in description logics}}, author={Cohen, W.W. and Borgida, A. and Hirsh, H.}, booktitle={Proceedings of the National Conference on Artificial Intelligence}, pages={754--754}, year={1992}, But I think you're right in that in practical terms this could be quite slow because current reasoner APIs don't AFAIK allow queries of the form "how many instances satisfy the CE <X>", instead it's necessary to query for the CE, and then count the size of the resulting set of objects which could be slow for large KBs. > Phil > >
Received on Tuesday, 4 May 2010 16:07:12 UTC