2nd CfP: Named Entity rEcognition and Linking (NEEL) Challenge (#Microposts2015 @ WWW2015)

=====================================================================

 Named Entity rEcognition and Linking (NEEL) Challenge

     @ the 5th Making Sense of Microposts Workshop (#Microposts2015)
   at WWW 2015

  http://www.scc.lancs.ac.uk/Microposts2015

   18th/19th May 2015

=====================================================================
Microposts are a highly popular medium for sharing facts, opinions 
and/or emotions. Microposts comprise an invaluable wealth of data, ready 
to be mined for training predictive models. Following the success of the 
challenge in 2013 and '14, we are pleased to announce the NEEL challenge 
which will be part of  the #Microposts2015 Workshop at the World Wide 
Web 2015 conference.

The challenge task is to automatically recognise entities and their 
types from English Microposts, and link them to the corresponding 
English DBpedia 2014 resources (where a linkage exists). Participants 
will have to automatically extract expressions that are formed by 
discrete (and typically short) sequences of words (e.g., "Barack Obama", 
London, Rakuten) and recognise their types (e.g., Person, Location, 
Organisation) from a collection of Microposts. At the linking stage 
participants must disambiguate the named entity spotted to the 
corresponding DBpedia resource, or to a NIL reference if it does not 
match any resource in DBpedia. The 2015 challenge will also evaluate the 
end-to-end performance of the system, by measuring the computation time 
for analysing the corpus using each algorithm submitted.

We welcome and aim to attract participants from the previous Microposts 
workshop challenges, as well as TREC, TAC KBP, ERD shared tasks to the 
#Microposts2015 NEEL challenge.

DATASET
-------
The dataset comprises tweets extracted from a collection of over 18 
million tweets. They include event-annotated tweets provided by the 
Redites project 
(http://gow.epsrc.ac.uk/NGBOViewGrant.aspx?GrantRef=EP/L010690/1), 
covering multiple noteworthy events from 2011 and 2013  (including the 
death of Amy Winhehouse, the London Riots, the Oslo bombing and the 
Westgate Shopping Mall shootout), and tweets extracted from the Twitter 
firehose in 2014. Since the challenge task is to automatically recognise 
and link entities, we have built our dataset considering both event and 
non-event tweets. While event tweets are more likely to contain 
entities, non-event tweets enable us to evaluate the performance of the 
system in avoiding false positives in the entity extraction phase. The 
training set is built on top of the entire corpus of the NEEL 2014 
Challenge. We have further extended it for typing the entities and 
adding NIL references.

Following the Twitter ToS we will only provide tweet IDs and annotations 
for the training set; and tweet IDs for the test set. We will also 
provide a common framework for mining these datasets from Twitter. The 
training set will be released as tsv, following the TAC KBP format, 
where each line consists of the following features:

1st: tweet id
2nd, 3rd: start/end offsets expressed as the number of UTF-8 characters 
starting from 0 (the beginning of the tweet)
4th: link to DBpedia resource or NIL (there may be different NIL 
references in the corpus. Each NIL may be reused if there are multiple 
mentions in the text which represent the same entity)
5th: salience (confidence score)
6th: type

Tokens are separated by TABs. Entity mentions and URIs are listed 
according to their position in the tweet. We will notify release of the 
data set from @Microposts2015, on the workshop website and mailing list 
(contact info below).

EVALUATION
----------
Participants are required to implement their systems as a publicly 
accessible web service following a REST-based protocol, which will be 
publicised before the release of the training set, and submit (up to 10) 
contending entries to the registry of the NEEL challenge services. Upon 
receiving the registration of the service, calls to each entry will be 
scheduled in two different time windows:
   * D-Time (to test the APIs)
 * T-Time for the final evaluation and metric computations.
In the final stage, each participant may submit up to 3 final contending 
entries.

We will use the metrics proposed by TAC KBP 2014 
(https://github.com/wikilinks/neleval/wiki/Evaluation), and in 
particular we will focus on:

[tagging] strong_typed_mention_match (check entity name boundary and type)
[linking] strong_mention_match
[clustering] mention_ceaf (NIL detection)

To ensure the correctness of the results and avoid any loss we will 
trigger N number of calls and statistically evaluate the metrics.

SUBMISSIONS
-----------
The written component comprises an extended abstract (2 pages ACM) 
describing the approach taken, how it was tuned/tested using the 
training split, and the results. Submissions should be prepared 
according to the ACM SIG Proceedings Template (see 
http://www.acm.org/sigs/publications/proceedings-templates), and include 
author names and affiliations, and 3-5 author-selected keywords. Where a 
submission includes additional material submission this should be made 
as a single, unencrypted zip file that includes a plain text file 
listing its contents. Submission is via EasyChair, at: 
https://www.easychair.org/conferences/?conf=Microposts2015. Each 
submission will receive at least 2 peer reviews.

The #Microposts2015 proceedings will be published as a single volume 
containing all three tracks, via CEUR. The same publication conditions 
however apply as for other workshop proceedings included in the WWW 
conference companion:
"Any paper published by the ACM, IEEE, etc. which can be properly cited 
constitutes research which must be considered in judging the novelty of 
a WWW submission, whether the published paper was in a conference, 
journal, or workshop. Therefore, any paper previously published as part 
of a WWW workshop must be referenced and suitably extended with new 
content to qualify as a new submission to the Research Track at the WWW 
conference."

IMPORTANT DATES
---------------
Intent to participate: 26 Jan 2015 (register to 
http://goo.gl/forms/MLcSidVTbj)
Release of REST API spec: 2 Feb 2015
Release of training set: 15 Feb 2015
Registration of contending entries: 2 Mar 2015
D-Time: 10-15 Mar 2015 (hard)
T-Time: 20-25 Mar 2015 (hard)
Paper submission: 28 Mar 2015 (hard)
Challenge Notification: 21 Apr 2015
Challenge camera-ready deadline: 30 Apr 2015

(All deadlines 23:59 Hawaii Time)

Workshop - 18/19 May 2015 (Registration open to all)

WORKSHOP STRUCTURE
------------------
A keynote address from an invited speaker will open the day, and 
followed by paper presentations. We will hold a poster and demo session 
to trigger further, in-depth interaction between workshop participants. 
The last set of presentations will be brief overviews of selected 
submissions to the Challenge. The workshop will close with the 
presentation of awards.

PRIZE
-----
A prize of € 1500, generously sponsored by SpazioDati, will be awarded 
to the highest ranking submission. SpazioDati is an Italian startup 
focused on text analytics and big data. One of SpazioDati's key 
components is dataTXT, a text-analytics engine available on 
SpazioDati's' API platform, Dandelion. The dataTXT named-entity 
extraction system has been proven to be very effective and efficient on 
short and fragmented texts, like Microposts. By teaming up with 
SpazioDati to make the challenge possible, the #Microposts workshop 
organisers wish to highlight new entity extraction methods and 
algorithms to pursue in such challenging scenarios.

CONTACT
-------
E-mail: microposts2015@easychair.org
Subscribe to mailing list at: 
https://groups.google.com/forum/#!forum/Microposts2015

Twitter persona: @microposts2015
Twitter hashtags: #neel #microposts2015

W3C Microposts Community Group: http://www.w3.org/community/microposts

Challenge Chairs:
-----------------
A. Elizabeth Cano, KMi, The Open University, UK
Giuseppe Rizzo, EURECOM, France

Dataset  Chair:
---------------
Andrea Varga, Swiss Re, UK

Workshop Organisers
-------------------
Matthew Rowe, Lancaster University, UK
Milan Stankovic, Université Paris-Sorbonne & Sépage, France
Aba-Sah Dadzie, University of Birmingham, UK

Challenge  Committee:
---------------------
Gabriele Antonelli, SpazioDati, Italy
Ebrahim Bagheri, Ryerson University, Canada
Pierpaolo Basile, University of Bari, Italy
Grégoire Burel, KMi, Open University, UK
Leon Derczynski, The University of Sheffield, UK
Milan​ Dojchinovski, Czech Technical University, Prague
Guillaume Erétéo, Vigiglobe, France
Andrés García-Silva, Universidad Politécnica de Madrid, Spain
Anna Lisa Gentile, University of Sheffield, UK
Miguel Martinez-Alvarez, Signal, London, UK
José M. Morales del Castillo, El Colegio de México, Mexico
Georgios Paltoglou, University of Wolverhampton, UK
Bernardo Pereira Nunes, PUC-Rio, Brazil
Daniel Preoţiuc-Pietro, University of Pennsylvania, USA
Ermir Qeli, Swiss Re, Switzerland
Giles Reger, Otus Labs Ltd, Sheffield, UK
Irina Temnikova, Qatar Computing Research Institute, Qatar
Raphaël Troncy, Eurecom, France
Victoria Uren, Aston Business School, UK
Ermir Qeli, Swiss Re, Switzerland

Received on Thursday, 15 January 2015 12:33:11 UTC