W3C home > Mailing lists > Public > public-lod@w3.org > February 2010

Re: DBpedia-based entity recognition service / tool?

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Tue, 02 Feb 2010 13:25:57 -0500
Message-ID: <4B686E35.9000901@openlinksw.com>
To: nathan@webr3.org
CC: public-lod@w3.org
Nathan wrote:
> Matthias Samwald wrote:
>   
>> Nathan wrote
>>     
>>> Quite sure the results speak for themselves + glad that so much useful
>>> information can be extracted from text all ready.
>>>       
>> The results look good indeed. It even passed the FOAF test!
>>
>> Can you estimate the ratio of contributions from Zemanta / contributions
>> from OpenCalais? Does one source add more than the other? Does the ratio
>> vary significantly between different texts?
>>
>>     
>
> Zemanta is more precise, OpenCalais is more verbose; results vary
> depending on the subject matter and each documents content - in all
> honesty there is no way to say one is better than the other, but I can
> say that both combined is as good as you can get for now.
>
> Noted that Kingsley mentioned the Sponger Middleware for virtuoso, this
> would allow you to do the same afaik, but faster and with the option of
> adding in more sponger cartridges for virtually any third party api's +
> with the extensive list of cartridges already included it's definitely
> an option worth looking in to - and ultimately the fastest / most reliable.
>
> fyi: i ran your source text through the combination system and here's
> what it brings back:
>
> http://dbpedia.org/resource/Nervous_system
> http://dbpedia.org/resource/Albizia
> http://dbpedia.org/resource/5-HT1A_receptor
> http://dbpedia.org/resource/Serotonin
> http://dbpedia.org/resource/Albizia_julibrissin
> http://dbpedia.org/resource/Serotonergic
> http://dbpedia.org/resource/Parasympathetic_nervous_system
> http://dbpedia.org/resource/Neurochemistry
> http://dbpedia.org/resource/Biochemistry
> http://dbpedia.org/resource/Neurotransmitter
> http://dbpedia.org/resource/Physiology
> http://dbpedia.org/resource/Biology
>
> regards,
>
> Nathan
>
>
>   
Nathan,

Re. mathias' doc (don't pick up anything useful via opencalais, zemanta, 
alchemy or any of the other meta cartridges), what did you use for the 
entity extraction? I ask because we have bio2rdf and linked open drug 
data etc.. loaded to: http://lod.openlinksw.com . If the life science 
realm entities are extracted, we can get FCT to locate associated 
entities, all that is required is an extractor cartridge that is place 
ahead of the LOD lookup re. sponger workflow.

Note: Meta Cartridges are about doing lookups on the graphs produced by 
the Extractor Cartridges; basically, its about augmenting the graph with 
URIs culled from a variety of Linked Data space lookups.

I am very intrigued re. your extractor :-)

Re. your doc:

URIBurner [1] or our Live Demo Server [2], both include the Sponger with 
fully loaded Cartridges.

The sponger generates proxy/wrapper Linked Data URIs and it also makes a 
local Graph IRI using the URL of the sponged Resource. Thus, based on 
our AlchemyAPI meta cartridge you can do the following using the /sparql 
[3] or /isparql [4] (* this one lets you share results pages or query 
definition pages via URLs*) services associated with either instance

If you want to forcefully clear out cache when your query is executed:

DEFINE get:soft "replace"
SELECT ?o2 ?o3
FROM <http://webr3.org/__play/optimal/webr3.html>
WHERE {
                ?s rdfs:seeAlso ?o.
                ?o 
<http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#Disambiguation> ?o2.
                ?o2 ?p ?o3
               }
LIMIT 50

If you are happy to work with warm cache or let the Virtuoso instance 
deal with the cache invalidation then:


SELECT ?o2 ?o3
FROM <http://webr3.org/__play/optimal/webr3.html>
WHERE {
                ?s rdfs:seeAlso ?o.
                ?o 
<http://rdf.alchemyapi.com/rdf/v1/s/aapi-schema#Disambiguation> ?o2.
                ?o2 ?p ?o3
               }
LIMIT 50

## For just distinct DBpedia URIs

DEFINE get:soft "replace"
SELECT distinct ?o ?dbp  from <http://webr3.org/__play/optimal/webr3.html> 
WHERE {?s rdfs:seeAlso ?o. ?o ?p ?dbp
FILTER  ( regex(str(?dbp),".*dbpedia.org" ) )
}


Links
1. http://uriburner.com/sparql
2. http://uriburner.com/isparql
3. http://demo.openlinksw.com/sparql
4. http://demo.openlinksw.com/isparql
5. http://bit.ly/bAY8Uy -- SPARQL Protocol URL for the first query above
6. http://bit.ly/amqdfE -- ditto, but seeking all DBpedia URIs from the 
sponged resource



-- 

Regards,

Kingsley Idehen	      
President & CEO 
OpenLink Software     
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter: kidehen 
Received on Tuesday, 2 February 2010 18:26:25 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:25 UTC