- From: Guillaume Jacquet <guillaume.jacquet@jrc.ec.europa.eu>
- Date: Thu, 08 Sep 2016 15:20:44 +0200
- To: LN@cines.fr, corpora@uib.no, public-lod@w3.org
- Cc: clef@mail.dei.unipd.it, Ralf Steinberger <ralf.steinberger@jrc.ec.europa.eu>, Maud Ehrmann <maud.ehrmann@gmail.com>
- Message-id: <cf64021a-7d04-f4c0-51b7-746a46083d60@jrc.ec.europa.eu>
Dear all, we are pleased to announce a new release of the *JRC-Names* multilingual name resource, containing *more information* and now available as *Linked Data*. JRC-Names is a *highly multilingual named entity resource* for person and organisation names (called 'entities') developed by the European Commission’s Joint Research Centre (JRC). JRC-Names consists of large lists of names and their many spelling variants (up to hundreds for a single person), including across scripts (Latin, Greek, Arabic, Cyrillic, Japanese, Chinese, etc.). For example, the spellings Jean-Claude Juncker, Jean Cloud Junker, Jean-Claude Juencker, Жан-Клод Юнкер, جان كلود جونكر, Ζαν Κλοντ Γιούνκερ, 让-克洛德•容克, and many others have all been identified as referring to the 12th President of the European Commission. The resource is the by-product of the Europe Media Monitor (EMM) family of applications, which has been analysing up to 300,000 news reports per day, since 2004. EMM recognises names mentioned in the news in over twenty languages and decides automatically for each newly found name whether it belongs to a new entity or whether it is a spelling variant of a previously known entity. This resource allows EMM users to display news about people or organisations even if their names are spelt differently or if the news articles are written in different languages and scripts. JRC-Names has been available for download since September 2011, consisting of name variant lists and accompanying software (JRC-Names text version <https://ec.europa.eu/jrc/en/language-technologies/jrc-names>). The new Linked Data resource <https://data.europa.eu/euodp/en/data/dataset/jrc-names>, accessible through the European Union’s Open Data Portal <http://data.europa.eu/euodp/en/data>, offers more information compared to the previously released resource and tool, including: * titles and function names that have been historically found next to the person mentions; * information about the time period during which name variants and their titles were found; * various frequency counts; * links to other linked datasets such as DBpedia, New York Times Open Data and Talk of Europe. The JRC-Names RDF representation is based on /lemon /(Lexicon Model for Ontologies <https://www.w3.org/community/ontolex/wiki/Final_Model_Specification>), a model developed by the W3C Ontology-Lexica Community group which allows the expression of lexical information relative to ontologies. A detailed description of JRC-Names Linked Data representation is given in the reference paper mentioned below. Examples of usage of the resource include, among others: * entity linking, e.g. to deal with entity surface form variations; * cross-lingual linked data-set query and mapping; * search query expansion; * machine translation; * learning of transliteration rules; * named entity recognition and disambiguation; * cross-lingual document clustering. This new Linked Data edition is available through a SPARQL <https://data.europa.eu/euodp/en/data/dataset/jrc-names/resource/da30b11d-a07e-45dd-bdb6-5f2ba5835d27> endpoint and via a RDF dump <http://cidportal.jrc.ec.europa.eu/ftp/jrc-opendata/EMM/JRC-Names/LATEST/jrcnames_uri.zip>. It is registered on the datahub.io portal as JRC-Names <https://datahub.io/dataset/jrc-names-ec>. Additional information is available on this page <http://data.europa.eu/euodp/en/data/dataset/jrc-names>of EU Open Data Portal <http://data.europa.eu/euodp/en/data/dataset/jrc-names>. Examples of queries against the data-set include: * Given a person's name, retrieve all of its name variants; * Given a person's name, retrieve all of its name variants in a certain language; * Given a person's name, retrieve all of its titles/function names in a certain language; * Given a variant and a language, retrieve the corresponding entity; * Given a title and a language, retrieve all of the persons with this same title. Reference paper: Maud Ehrmann, Guillaume Jacquet and Ralf Steinberger (to appear). JRC-Names: Multilingual Entity Name variants and titles as Linked Data <http://www.semantic-web-journal.net/system/files/swj1307.pdf>, Semantic Web Journal (available online since 04/20/2016) Guillaume Jacquet, Maud Ehrmann, Ralf Steinberger European Commission Joint Research Centre Text and Data Mining Unit https://ec.europa.eu/jrc/en/language-technologies
Received on Thursday, 8 September 2016 15:51:54 UTC