Re: DBpedia-based entity recognition service / tool?

Nathan wrote:
> Davide Palmisano wrote:
>> On Tue, Feb 2, 2010 at 3:39 PM, Matthias Samwald <samwald@gmx.at> wrote:
>>> Davide wrote:
>>>> BTW: and what about http://www.alchemyapi.com ? have you tried it?
>>> AlchemyAPI does not seem to return DBpedia / Wikipedia identifiers (?)
>> yes, read here http://www.alchemyapi.com/api/entity/textc.html you
>> need to specify a parameter to enable this feature. I'm using this
>> tool with proficiency.
>>
> 
> Whilst I do like alchemy, I've found you can extract much, much more
> information, of a much higher standard by combining OpenCalais and
> Zemanta in the process outlined in a previous mail.
> 
> To illustrate I'll quickly hook in with alchemy again and post a few
> results for comparison shortly.
> 

for a quick comparison I've run through two documents through both
alchemy and the opencalais/zemanta/lookup combination system to see how
they compare; note with the alchemy results I've also included the
non-linked-data terms so you can see why I've not used it in my own system.

=================================================================

TEST 1:
source document: http://webr3.org/__play/optimal/webr3.html

Alchemy Results
=================================================================
Linked Data:
http://dbpedia.org/resource/England : England
http://dbpedia.org/resource/Google : Google

Generic Terms:
FieldTerminology : web 3.0
Company : wikipedia
City : London
FieldTerminology : URIs
Technology : HTML5
StateOrCounty : DC
City : Dublin
FieldTerminology : Web Developers
FieldTerminology : HTML

Notes:
Both "DC" and "Dublin" are incorrect, as we mentioned "Dublin Core".


Combined OpenCalais / Zemanta + dbpedia lookup system:
=================================================================
Linked Data:
http://dbpedia.org/resource/Linked_Data : Linked Open Data, LOD
http://dbpedia.org/resource/RDFa : RDFa
http://dbpedia.org/resource/Semantic_Web : Semantic Web
http://dbpedia.org/resource/HTML : HTML4
http://dbpedia.org/resource/Dublin_Core : Dublin Core
http://dbpedia.org/resource/Resource_Description_Framework : RDF
http://dbpedia.org/resource/Web_page : web pages
http://dbpedia.org/resource/HTML5 : HTML5
http://dbpedia.org/resource/London : London
http://dbpedia.org/resource/Web_design : web designer
http://dbpedia.org/resource/United_Kingdom : United Kingdom
http://dbpedia.org/resource/Web_search_engine : search engine
http://dbpedia.org/resource/Web_developer : Web developer
http://dbpedia.org/resource/Joe_Bloggs : Joe Blogs
http://dbpedia.org/resource/XHTML : XHTML
http://dbpedia.org/resource/Web_2.0 : Web 2.0
http://dbpedia.org/resource/Open_Data : Open Data
http://dbpedia.org/resource/Web_standards : Web standards
http://dbpedia.org/resource/FOAF_%28software%29 : FOAF
http://dbpedia.org/resource/Computing : Computing
http://dbpedia.org/resource/World_Wide_Web : World Wide Web

=================================================================


TEST 2:
source document: http://news.bbc.co.uk/1/hi/world/asia-pacific/8492608.stm

Alchemy Results
=================================================================
Linked Data:
http://dbpedia.org/resource/People's_Republic_of_China : China
http://dbpedia.org/resource/United_States : United States
http://dbpedia.org/resource/Republic_of_China : Taiwan
http://dbpedia.org/resource/Beijing : Beijing
http://dbpedia.org/resource/Communist_Party_of_China : Chinese Communist
Party
http://dbpedia.org/resource/Washington,_D.C. : Washington DC
http://dbpedia.org/resource/White_House : White House
http://dbpedia.org/resource/Barack_Obama : Barack Obama
http://dbpedia.org/resource/Google : Google
http://dbpedia.org/resource/Ministry_of_Foreign_Affairs_(People's_Republic_of_China)
: Chinese Foreign Ministry
http://dbpedia.org/resource/Boeing : Boeing
http://dbpedia.org/resource/Iran : Iran
http://dbpedia.org/resource/Tehran : Tehran

Generic Terms:
Person : Dalai Lama
Person : Mr Zhu
Country : Tibet
City : Washington
Person : Mr Obama
Person : Zhu Weiqun
Technology : aerospace
Person : Obama
Company : BBC
GeographicFeature : Himalayan
Person : Ma Zhaoxu
Person : Paul Reynolds
Person : Kasur Lodi Gyarit


Combined OpenCalais / Zemanta + dbpedia lookup system:
=================================================================
Linked Data:
http://dbpedia.org/resource/Communist_Party_of_China : Chinese Communist
Party
http://dbpedia.org/resource/Barack_Obama : Barack Obama
http://dbpedia.org/resource/United_States : United States
http://dbpedia.org/resource/People%27s_Republic_of_China : China
http://dbpedia.org/resource/Washington%2C_D.C. : DC, Washington DC
http://dbpedia.org/resource/China : Sino
http://dbpedia.org/resource/President_of_the_United_States : US President
http://dbpedia.org/resource/Dalai_Lama : Dalai Lama
http://dbpedia.org/resource/Republic_of_China : Taiwan
http://dbpedia.org/resource/Arms_industry : arms sales
http://dbpedia.org/resource/Official : Official
http://dbpedia.org/resource/Ma_Zhaoxu : Ma Zhaoxu
http://dbpedia.org/resource/Paul_Reynolds : Paul Reynolds
http://dbpedia.org/resource/Internet_censorship : Internet censorship
http://dbpedia.org/resource/Tibet : Tibet
http://dbpedia.org/resource/United_Front_Work_Department : United Front
Work Department
http://dbpedia.org/resource/Web_search_engine : search engine
http://dbpedia.org/resource/Zhu_Weiqun : Zhu Weiqun
http://dbpedia.org/resource/Communist_party : Communist party
http://dbpedia.org/resource/Himalayan : Himalayan
http://dbpedia.org/resource/Boeing : Boeing
http://dbpedia.org/resource/Washington : Washington
http://dbpedia.org/resource/Ministry_of_Foreign_Affairs_of_the_People%27s_Republic_of_China
:  Chinese Foreign Ministry
http://dbpedia.org/resource/Beijing : Beijing
http://dbpedia.org/resource/Spiritual_leader : Spiritual leader
http://dbpedia.org/resource/Correspondent : Correspondent
http://dbpedia.org/resource/Tehran : Tehran
http://dbpedia.org/resource/BBC : BBC
http://dbpedia.org/resource/President : President
http://dbpedia.org/resource/Spokesman : Spokesman
http://dbpedia.org/resource/White_House : White House
http://dbpedia.org/resource/GBP_%28disambiguation%29 : GBP
http://dbpedia.org/resource/United_States_dollar : USD
http://dbpedia.org/resource/Islamic_Republic_of_Iran : Islamic Republic
of Iran
http://dbpedia.org/resource/Dalai_Lama_Renaissance : Dalai Lama Renaissance
http://dbpedia.org/resource/14th_Dalai_Lama : 14th Dalai Lama
http://dbpedia.org/resource/Lhasa : Lhasa
http://dbpedia.org/resource/Central_Tibetan_Administration :  Politics
of Tibet
http://dbpedia.org/resource/Buddhism : Buddhism


=================================================================

Quite sure the results speak for themselves + glad that so much useful
information can be extracted from text all ready.

It is worth noting that the combined system takes between 4-7 seconds
(with cache) so it's definitely lacking in that respect!

Regards,

Nathan

Received on Tuesday, 2 February 2010 16:37:13 UTC