CfP: Named Entity rEcognition and Linking (NEEL) Challenge (#Microposts2015 @ WWW2015)

        Named Entity rEcognition and Linking (NEEL) Challenge
           at the 5th Making Sense of Microposts Workshop
                    #Microposts2015 @ WWW 2015
                18th/19th May 2015, Florence, Italy

Microposts are a highly popular medium to share facts, opinions or 
emotions. They are an invaluable wealth of data, ready to be mined for 
training predictive modellings. Following the success of the last year, 
we are pleased to announce the NEEL challenge which will be part of  the 
#Microposts2015 Workshop at the World Wide Web 2015 conference.

The overall task of the challenge is to automatically recognise entities 
and their types from English microposts, and to link them to the 
corresponding English DBpedia 2014 resources (if the linkage exists). 
Participants will have to automatically extract expressions that are 
formed by discrete (and typically short) sequences of words (e.g., 
Obama, London, Rakuten) and recognise their types (e.g., Person, 
Location, Organisation) from a collection of microposts. As linking 
stage we aim to disambiguate the spotted entity to the corresponding 
DBpedia resource, or to a NIL reference if the spotted named entity does 
not match any resource in DBpedia. This year challenge will also 
evaluate the end-to-end performance of the system by measuring the 
computation time for analyzing the corpus using the submitted algorithms.

We welcome and hope participants from NEEL, TREC, TAC KBP, ERD shared 
tasks to participate in this year challenge.

The dataset comprises tweets extracted from a collection of over 18 
million tweets. The dataset includes event-annotated tweets provided by 
the Redites project ( covering 
multiple noteworthy events from 2011, 2013  (including the death of Amy 
Winhehouse, the London Riots, the Oslo bombing and the Westgate Shopping 
Mall shootout) and tweets extracted from the Twitter firehose from 2014. 
Since the task of this challenge is to automatically recognise and link 
entities, we have built our dataset considering both event and non-event 
tweets. While event tweets are more likely to contain entities, 
non-event tweets enable us to evaluate the performance of the system in 
avoiding false positives in the entity extraction phase. The training 
set is built on top of the entire corpus of the NEEL 2014 Challenge. We 
have further extended it for typing the entities and adding the NIL 

Following the Twitter TOS we will only provide tweet IDs and annotations 
for the training set; and tweet IDs for the test set. We will also 
provide a common framework to mine these datasets from Twitter.  The 
training set will be released as tsv following the TAC KBP format, where 
each line consists of the following features:

1st: tweet id
2nd,3rd: start/end offsets expressed as the number of UTF8 characters 
starting from 0 (the beginning of the tweet)
4th: link to DBpedia resource or NIL (it may exist different NIL in the 
corpus. Each NIL may be reused if there are multiple mentions in the 
text which represent the same entity)
5th: salience (confidence score)
6th: type

Tokens are separated by TABs. Entity mentions and URIs are listed 
according to their appearance order in the tweet. We will timely 
advertise the release of the data sets on the workshop mailing list. 
Please subscribe to

Participants are required to implement their systems as a publicly 
accessible web service following a REST based protocol (which will be 
advertised on the mailing list before the release of the training set) 
and submit their contending entries (up to 10) to a registry of the NEEL 
challenge services. Upon receiving the registration of the service, 
calls to the contending entry will be scheduled in two different time 
windows, namely D-Time (meant to test the APIs) and T-Time for the final 
evaluation and metric computations. In the final stage, each participant 
can submit up to 3 final contending entries.

We will use the metrics proposed by TAC KBP 2014 
( and in particular 
we will focus on:

[tagging]     strong_typed_mention_match (check entity name boundary and 
[linking]     strong_mention_match
[clustering]  mention_ceaf (NIL detection)

To ensure the correctness of the results and avoid any loss we will 
trigger N number of calls and we will statistically evaluate the metrics.

A paper of 2 pages describing your approach, how you tuned/tested it 
using the training split, and your results. All submissions must be in 
English. All written submissions should be prepared according to the ACM 
SIG Proceedings Template (see, and should 
include author names and affiliations, and 3-5 author-selected keywords. 
Where a submission includes additional material submission this should 
be made as a single, unencrypted zip file that includes a plain text 
file listing its contents.Submission is via EasyChair, at: Each 
submission will receive at least 2 peer reviews.

The #Microposts2015 proceedings will be published as a single volume 
containing all three tracks, via CEUR. The same publication conditions 
however apply as for other workshop proceedings included in the WWW 
conference companion: "Any paper published by the ACM, IEEE, etc. which 
can be properly cited constitutes research which must be considered in 
judging the novelty of a WWW submission, whether the published paper was 
in a conference, journal, or workshop. Therefore, any paper previously 
published as part of a WWW workshop must be referenced and suitably 
extended with new content to qualify as a new submission to the Research 
Track at the WWW conference."

A keynote address from an invited speaker will open the day, and 
followed by paper presentations. We will hold a poster and demo session 
to trigger further, in-depth interaction between workshop participants. 
The last set of presentations will be brief overviews of selected 
submissions to the Challenge. The workshop will close with the 
presentation of awards.

Intent to participate: 20 Jan  2015 (soft - further instructions will be 
shared on the mailing list )
Release of the REST API specs: 2 Feb 2015
Release of training set: 15 Feb 2015
Registration of contending entries: 2 Mar 2015
D-Time: 10-15 Mar 2015 (hard)
T-Time: 20-25 Mar 2015 (hard)
Paper submission: 28 Mar 2015 (hard)
Challenge Notification: 21 Apr 2015  (hard)

Challenge camera-ready deadline: 31 Apr 2015 (hard)
Workshop program issued: 22 Apr 2015
Challenge proceedings to be published via CEUR
Workshop - 18/19 May 2015 (Registration open to all)
(All deadlines 23:59 Hawaii Time)

A prize of 1500 euros, generously sponsored by SpazioDati, will be 
awarded to the challenge winner. SpazioDati is an Italian startup 
focused on text analytics and big data. One the SpazioDati's key 
components is DataTXT, a text-analytics engine available on SpazioDati 
API platform, Dandelion. DataTXT named-entity extraction system has been 
proven to be very effective and efficient on short and fragmented texts, 
like microposts. By teaming up with SpazioDati to make the challenge 
possible, the #Microposts workshop organisers wish to highlight new 
entity extraction methods and algorithms to pursue in such a challenging 

Mailing list :!forum/microposts2015

Twitter hashtags: #neel #microposts2015
Twitter account: @Microposts2015
W3C Microposts Community Group:

Challenge Organizers:
Challenge Chair:
A. Elizabeth Cano, Knowledge Media Institute, The Open University, UK
Giuseppe Rizzo, EURECOM, France

Dataset  Chair:
Andrea Varga, Swiss Re, UK

Challenge Committee:
Gabriele Antonelli, SpazioDati, Italy
Ebrahim Bagheri, Ryerson University, Canada
Pierpaolo Basile, University of Bari, Italy
Leon Derczynski, The University of Sheffield, UK
Milan Dojchinovski, Czech Technical University, Czech Republic
Guillaume Ereteo, Vigiglobe, France
Andrés García-Silva, Univesidad Politécnica de Madrid, Spain
Anna Lisa Gentile, Sheffield, UK
Miguel Martinez-Alvarez, Signal, UK
Jose M. Morales-Del-Castillo, University of Granada, Spain
Georgios Paltoglou, University of Wolverhampton, UK
Bernardo Pereira Nunes, PUC-Rio, Brazil
Daniel Preoţiuc-Pietro, University of Pennsylvania, USA
Giles Reger, The University of Manchester, UK
Irina Temnikova, Qatar Computing Research Institute, Qatar
Raphaël Troncy, EURECOM, France
Victoria Uren, Aston University, UK

Giuseppe Rizzo
EURECOM, Multimedia Communications Department
450 Route des Chappes, 06410 Biot, France
Tel: +33 (0)4 - 9300 8148
Fax: +33 (0)4 - 9000 8200

Received on Wednesday, 17 December 2014 12:34:58 UTC