- From: Leon Derczynski <leon@dcs.shef.ac.uk>
- Date: Mon, 7 Apr 2014 11:45:31 +0200
- To: semantic-web@w3.org, public-lod@w3.org
- Message-ID: <CAPjwwFrFYfYa9bHkR_1GitTfk=6F_0d8+v4v8d9RRJ3oYHaFqw@mail.gmail.com>
Hi, I have a few questions about how to ground entities in linked data -- also known as the entity linking, or entity disambiguation, task, where a descriptive URI is associated with a mention of an entity in text. Sorry if they are a bit basic. 1. What should be found when one visits a grounding URI? Is there some special content negotiation in this scenario, e.g. a default Content-Type or an expected range of supported Accept-Encoding values? 2. To which resource should one link? DBpedia's Wikipedia "section" seems popular, but due to WP:GNG, we are bound to always have many, many NILs and it may be unsuitable for things like emerging events. For example, if someone mentions "my friend Jane", an entity is certainly described in the text (Jane is a rigid designator) but the mention might only be groundable in terms of e.g. an associated Twitter user ID. Is there a set of preferred vocabularies / ontologies for grounding when building datasets for public use? Especially in the academic context, e.g. the #Microposts workshop challenge series. 3. When evaluating accuracy of grounding, it is important to recognise different URIs that are the same entity. These occur a lot already. rdf:sameAs seems too fuzzy, and owl:exactMatch isn't used in a lot of cases. How can such URIs be reliably identified as anchors to the same entity? All the best, Leon -- Leon R A Derczynski Research Associate, NLP Group Department of Computer Science University of Sheffield, UK http://www.dcs.shef.ac.uk/~leon/
Received on Monday, 7 April 2014 09:46:00 UTC