Re: [Dbpedia-discussion] Dbpedia-Freebase raw dump of conditional probabilities from Ryan Shaw on 2009-08-20 (public-lod@w3.org from August 2009)

From: Ryan Shaw <ryanshaw@ischool.berkeley.edu>
Date: Thu, 20 Aug 2009 16:35:39 -0700
To: Mike Bergman <mike@mkbergman.com>
Cc: "public-lod@w3.org community" <public-lod@w3.org>
Message-ID: <af556a820908201635i3600236cv8742bd9ff1b13d44@mail.gmail.com>

>> It strikes me that this is the kind of thing it would be useful for
>> publish as Linked Data. In other words, rather than analyzing
>> instances, calculating a bunch of conditional probabilities, and then
>> publishing a bunch of [ sameAs | equivalentClass | seeAlso | whatever
>> ] assertions, one could publish a bunch of conditional probabilities
>> or other similarity values, with  some indication of the type of
>> similarity measure used and links to the specific instance sets used
>> to calculate the values. Others could then use these measures as they
>> wished, setting their own thresholds for when to consider something an
>> equivalence relation or not.
>>
>> Are there any vocabularies that might be used to publish such as data
>> set as Linked Data?
>
> UMBEL has a specific vocabulary and set of properties for this. See the
> umbel:withAlignment and umbel:withLikelihood properties:
>
> http://www.umbel.org/technical_documentation.html#vocabulary

If I understand correctly, umbel:withAlignment is for class alignment
and umbel:withLikelihood is for instance alignment?

These seem like a good start, but I see a couple of drawbacks:

1. They rely on reification, which some in the LOD community seem to dislike[1].
2. There is no way to distinguish different similarity measures, e.g.
Jaccard coefficient vs. mutual info vs. log-likelihood.

[1]http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/#avoidinLinkedDataContext

Received on Thursday, 20 August 2009 23:36:21 UTC