W3C home > Mailing lists > Public > public-lod@w3.org > February 2010

Re: DBpedia-based entity recognition service / tool?

From: Nathan <nathan@webr3.org>
Date: Thu, 04 Feb 2010 16:35:34 +0000
Message-ID: <4B6AF756.8070909@webr3.org>
To: Tom Morris <tfmorris@gmail.com>
CC: Ivan Herman <ivan@w3.org>, Matthias Samwald <samwald@gmx.at>, public-lod@w3.org
Tom Morris wrote:
> On Tue, Feb 2, 2010 at 10:21 AM, Nathan <nathan@webr3.org> wrote:
>> I should probably be replying here as I've been doing this, and working
>> on this for the past few months.
>> I've found from experience that the only viable way to address this need
>> is to do as follows:
>> 1: Pass content through to both OpenCalais and Zemanta
>> 2: Combine the results to provide a list of "string" terms to be
>> associated with dbpedia resources (where zemanta hasn't already done it)
>> 3: Lookup each string resource and try and associate it to the string
>> 4: Return all matches with results to the end user in order for them to
>> manually confirm the results.
>> Steps 3 and 4 are the killers here, because no matter how could the
>> service you can't always match to exact URIs (sometimes you can only
>> determine that you may mean one of X many ambiguous URIs); and ...
> I don't understand the roundabout approach since both of these
> services output Freebase identifiers and they are all mapped
> explicitly to both DBpedia by owl:sameAs and Wikipedia via normal URL.
> Why not just follow the links directly?  The only time this won't work
> is where the concept was sourced from someplace other than Wikipedia
> or Wikipedia article(s) were split/merged so there isn't a 1:1
> correspondence.

Where they are available; I do - but you still get an amount of terms
which are not mapped, and can be mapped by doing lookups; and where you
are unsure promoting the user to do the disambiguation provides a fuller
result :)

also obviously as the services improve the need for lookups drops; and
finally it allows for domain specific document / thing relations

Received on Thursday, 4 February 2010 16:36:15 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:20:56 UTC