[ESWC 2015] Call for Challenge: Open Knowledge Extraction Challenge

** apologies for cross-posting **

==== Call for Challenge: Open Knowledge Extraction Challenge ====

Challenge Website: https://github.com/anuzzolese/oke-challenge

12th Extended Semantic Web Conference (ESWC) 2015
Dates: May 31 - June 4, 2015
Venue: Portoroz, Slovenia
Hashtag: #eswc2015
Feed: @eswc_conf
Site: http://2015.eswc-conferences.org

- Fabien Gandon (Inria, Sophia Antipolis, France)

- Elena Cabrio (Inria, Sophia Antipolis, France)
- Milan Stankovic (SEPAGE, Paris, France)

- Aldo Gangemi, LIPN, University Paris 13 (France)
- Roberto Navigli, University of Rome La Sapienza (Italy)
- Valentina Presutti, CNR STLAB Laboratory (Italy)
- Dario Garigliotti, University of Rome La Sapienza (Italy)
- Anna Lisa Gentile, University of Sheffield (UK)
- Andrea Nuzzolese, CNR STLAB Laboratory (Italy)

* March 3, 2015, 23:59 CET: Submission due
* April 9, 2015, 23:59 CET: Notification of acceptance
* May 31 - June 4, 2015: Challenge days

The vision of the Semantic Web (SW) is to populate the Web with
machine understandable data so as to make intelligent agents able to
automatically interpret its content - just like humans do by
inspecting Web content - and assist users in performing a significant
number of tasks, relieving them of cognitive overload. The Linked Data
movement kicked-off the vision by realising a key bootstrap in
publishing machine understandable information mainly taken from
structured data (typically databases) or semi-structured data (e.g.
Wikipedia infoboxes). However, most of the Web content consists of
natural language text, e.g., Web sites, news, blogs, micro-posts,
etc., hence a main challenge is to extract as much relevant knowledge
as possible from this content, and publish it in the form of Semantic
Web triples.

There is huge work on knowledge extraction (KE) and knowledge
discovery contributing to address this problem, however most of the
evaluations are focused on linking extracted facts and entities to
concepts already existing on available Knowledge Bases (KB).

The Open Knowledge Extraction Challenge focuses on the production of
new knowledge aimed at either populating and enriching existing
knowledge bases or creating new ones. This means that the defined
tasks focus on extracting concepts, individuals, properties, and
statements that not necessarily exist already in a target knowledge
base, and on representing them according to Semantic Web standard in
order to be directly injected in linked datasets and their ontologies.

This is in line with available efforts in the community (e.g.
http://aksw.org/Projects/GERBIL.html) to uniform results of existing
KE methods to make them directly reusable for populating the SW.
In this direction, the proposed tasks will be structured following a
common formalisation, the required output will be in a standard SW
format (specifically the Natural Language Interchange (NIF) format
will be required for all tasks) and the evaluation procedure will be
publicly available in a standard evaluation framework.

The OKE challenge, has the ambition to advance a reference framework
for research on Knowledge Extraction from text for the Semantic Web by
re-defining a number of tasks (typically from information and
knowledge extraction) by taking into account specific SW requirements.

The Challenge is open to everyone from industry and academia.
We expect to trigger attention from the Knowledge Extraction community
and foster their broader integration with the Semantic Web community.


The OKE Challenge is defined in terms of three different tasks. Each
system can participate to each task individually.

- Task 1: Named Entity Resolution, Linking and Typing for Knowledge
Base population.
This task consists of (i) identifying Named Entities in a sentence and
create an OWL individual (owl:Individual statement) representing it,
(ii) assigning a type to such individual (rdf:type statement) selected
from a set of given types (the given types will be a subset of a
popular KB, e.g. DBpedia, and will be given by the organisers) and
(iii) link (owl:sameAs statement) such individual, when possible, to a
reference KB (which will be stated by the organisers, e.g. DBpedia).

- Task 2: Class Induction and entity typing for Vocabulary and
Knowledge Base enrichment.
This task consist in producing rdf:type statements, given definition
texts. The participants will be given a dataset of sentences, each
defining an entity (known a priori), e.g. the entity:
“dpedia:Skara_Cathedral”, and its definition: “Skara Cathedral is a
church in the Swedish city of Skara.”.
Participants are expected to (i) identify the type(s) of the given
entity as they are expressed in the given definition, (ii) create a
owl:Class statement for defining each of them as a new class in the
target knowledge base, (iii) create a rdf:type statement between the
given entity and the new created classes, and (iv) align the
identified types, if a correct alignment is available, to a set of
given types (the given types will be a subset of a popular KB, e.g.
DBpedia and will be given by the organisers).

- Task 3: Relation extraction and naming, and triple generation for
Ontology and Knwoledge Base enrichment.
The participants will be given as input a sentence and two entities
contained in the sentence. The task consists in (i) assessing whether
the sentence contains an evidence of a relation between the two input
entities and if true (ii) the creation of a OWL property representing
the relation, including a value for its rdf:label annotation
statement, and (iii) the production of a statement for the relation.
The triple must be of the form <entity1> <relation> <entity2>; where:
a. <entity1>, <entity2> are the input URIs, i.e., the given pair of
entities as subject and object of the statement
b. <relation> is the learnt OWL property as predicate.
The URI for the predicate must be created by the participants; we will
not require the linking with a reference KB, but we will provide a
formalism to produce the URI for the relation and use string
similarity measure to assess the results against a Gold Standard.

Systems will be evaluated against a testing dataset for each task
which will be released after a first-round of evaluation during the
Conference. Participants are recommended to train and/or test their
own systems using the training dataset available on the Challenge
website (https://github.com/anuzzolese/oke-challenge) starting from
February 16th. Precision, recall, F1-measure for all the tasks will be
computed automatically by using a state of the art benchmarking tools,
such as GERBIL. When necessary (e.g task 3) an adapted evaluation will
be added to the benchmark tool to include string similarity within the

A subjective evaluation will be performed by the members of the
Advisory Board. For each system, reviewers will asses the methodology,
the technical soundness and the innovativeness of the system.

Systems will be evaluated in terms of standard precision, recall and
F-measure. The evaluation will be performed by using a state of the
art benchmarking tools, such as GERBIL.

We propose to award systems based on two criteria, judged separately:
    Subjective: the paper describing the system will be assessed by
the reviewers.
    Objective: the system with the highest scores in the evaluation benchmark.

All papers passing the subjective evaluation will be competing for the
objective evaluation, and will be published in the challenge
proceedings. A number of **finalists** systems, considering results
both from subjective and objective evaluation, will have to present
their work in a conference dedicated session. The exact number of
finalists and the presentation style will depend on the Conference

The following information has to be provided:
* Abstract: no more than 200 words.
* Description: It should contain the details of the system, including
why the system is innovative, how it uses Semantic Web, which features
or functions the system provides, what design choices were made and
what lessons were learned. The description should also summarise how
participants have addressed the evaluation tasks. Papers must be
submitted in PDF format, following the style of the Springer's Lecture
Notes in Computer Science (LNCS) series
(http://www.springer.com/computer/lncs/lncs+authors), and not
exceeding 12 pages in length.
* Web Access: The application can either be accessible via the web or
downloadable. If the application is not publicly accessible, password
must be provided. A short set of instructions on how to use the
application should be provided as well.

Papers are submitted in PDF format via the challenge's EasyChair
submission pages https://easychair.org/conferences/?conf=oke2015

A mailing list dedicated to the challenge will be available to all
participants in order to allow them share comments and questions and
benefit from receiving the latest news and from the organisers’

Additional information can be found at

Received on Friday, 13 February 2015 15:15:31 UTC