Re: A few questions about grounding entities

Dear Leon,

your problem can be generalized as the “Web yellow pages” one that started being addressed at least since the IRW workshop at WWW2006.
A web of public URIs for every imaginable thing would be the solution, but due to the nature of the Web, with holes, redundancies, and fakes, it cannot be easily achieved, let alone the issue of knowing at each time every imaginable thing.

Pragmatically speaking, there is one project, OKKAM [1], which has attempted to create a secure, sustainable public list of things with its Entity Name System. Some concern might be raised on how complex is to perform useful 303 content negotiation by third parties on such public but secured entities.

Regular practice includes the creation of public URIs for specific domains/tasks, e.g. within organizations, which can ideally be published to boost reuse.To my knowledge, there are no special recommendations for this task besides the general LOD principles.

Best
Aldo

[1] http://www.okkam.org

On Apr 7, 2014, at 11:45:31 AM , Leon Derczynski <leon@dcs.shef.ac.uk> wrote:

> Hi,
> 
> I have a few questions about how to ground entities in linked data -- also known as the entity linking, or entity disambiguation, task, where a descriptive URI is associated with a mention of an entity in text. Sorry if they are a bit basic.
> 
> 1. What should be found when one visits a grounding URI? Is there some special content negotiation in this scenario, e.g. a default Content-Type or an expected range of supported Accept-Encoding values?
> 
> 2. To which resource should one link? DBpedia's Wikipedia "section" seems popular, but due to WP:GNG, we are bound to always have many, many NILs and it may be unsuitable for things like emerging events. For example, if someone mentions "my friend Jane", an entity is certainly described in the text (Jane is a rigid designator) but the mention might only be groundable in terms of e.g. an associated Twitter user ID. Is there a set of preferred vocabularies / ontologies for grounding when building datasets for public use? Especially in the academic context, e.g. the #Microposts workshop challenge series.
> 
> 3. When evaluating accuracy of grounding, it is important to recognise different URIs that are the same entity. These occur a lot already. rdf:sameAs seems too fuzzy, and owl:exactMatch isn't used in a lot of cases. How can such URIs be reliably identified as anchors to the same entity?
> 
> All the best,
> 
> 
> Leon
> 
> -- 
> Leon R A Derczynski
> Research Associate, NLP Group
> 
> Department of Computer Science
> University of Sheffield, UK
> 
> http://www.dcs.shef.ac.uk/~leon/

Received on Monday, 7 April 2014 11:14:58 UTC