A few questions about grounding entities

Hi,

I have a few questions about how to ground entities in linked data -- also
known as the entity linking, or entity disambiguation, task, where a
descriptive URI is associated with a mention of an entity in text. Sorry if
they are a bit basic.

1. What should be found when one visits a grounding URI? Is there some
special content negotiation in this scenario, e.g. a default Content-Type
or an expected range of supported Accept-Encoding values?

2. To which resource should one link? DBpedia's Wikipedia "section" seems
popular, but due to WP:GNG, we are bound to always have many, many NILs and
it may be unsuitable for things like emerging events. For example, if
someone mentions "my friend Jane", an entity is certainly described in the
text (Jane is a rigid designator) but the mention might only be groundable
in terms of e.g. an associated Twitter user ID. Is there a set of preferred
vocabularies / ontologies for grounding when building datasets for public
use? Especially in the academic context, e.g. the #Microposts workshop
challenge series.

3. When evaluating accuracy of grounding, it is important to recognise
different URIs that are the same entity. These occur a lot already.
rdf:sameAs seems too fuzzy, and owl:exactMatch isn't used in a lot of
cases. How can such URIs be reliably identified as anchors to the same
entity?

All the best,


Leon

-- 
Leon R A Derczynski
Research Associate, NLP Group

Department of Computer Science
University of Sheffield, UK

http://www.dcs.shef.ac.uk/~leon/

Received on Monday, 7 April 2014 09:46:00 UTC