ESWC 2014 Call for Challenge: Semantic Publishing

From: Christoph LANGE <math.semantic.web@gmail.com> · Date: Fri, 20 Dec 2013 01:44:01 +0100

==== Call for Challenge: Semantic Publishing ====

Challenge Website: http://challenges.2014.eswc-conferences.org/SemPub
Call Web page: http://2014.eswc-conferences.org/important-dates/call-SemPub

MOTIVATION AND OBJECTIVES
Scholarly publishing is increasingly enabling a new wave of applications
that better support
researchers in disseminating, exploiting and evaluating their results.
The potential of publishing
scientific papers enriched with semantic information is huge and raises
interesting and challenging
issues. Semantic Web technologies play a central role in this context,
as they can help publishers to make
scientific results available in an open format the whole research
community can benefit from.
The Semantic Publishing Challenge 2014 is intended to be the first in a
series of events at ESWC for
producing and exploiting semantic publishing data. The main focus this
year is on extracting
information and using this information to assess the quality of
scientific productions.
Linked open datasets about scientific production exist - e.g. DBLP - but
they usually cover basic
bibliographic information, which is not sufficient to assess quality.
Quality-related information,
such as the number of journal articles that cite a given publication, or
the number of times a workshop has
been run by the same chairs and received many submissions, or the actual
function of a citation are often
hidden and not yet available as LOD. The main goal of the Challenge is
to build high-quality LOD that
contains such information.
Moving off this year's outcomes, we plan to investigate the final
publication and exploitation side in the
next editions.

TARGET AUDIENCE
The Challenge is open to everyone from industry and academia.

TASKS
We ask challengers to automatically annotate a set of multi-format and
multi-source input documents and
to produce a Linked Open Dataset that fully describes these documents,
their context, and relevant parts
of their content. The evaluation will consist of evaluating a set of
queries against the produced dataset
to assess its correctness and completeness.
The input dataset will be split in two parts: a training/testing part
and an evaluation part, which will
disclosed a few days before the submission deadline. Participants will
be asked to run their tool on the
evaluation dataset and to produce the final Linked Open Dataset.
Further details about the organization of the Challenge will be
provided. The Challenge will include two tasks:

Task 1: Extraction and assessment of workshop proceedings information
Participants are required to extract information from a set of HTML
tables of contents, partly including
microformat and RDFa annotations but not necessarily being valid HTML,
of selected computer science
workshop proceedings published with the CEUR-WS.org open access service.
The extracted information is
expected to answer queries about the quality of these workshops, for
instance by measuring their growth,
longevity, connection with other events, distribution of papers and authors.

Task 2: Extraction and characterization of citations
Participants are required to extract information about the citations in
scientific journals and their
relevance. Input documents are in XML JATS and TaxPub, an official
extension of JATS customized for
taxonomic treatments, and selected from the PubMedCentral Open Access
Subset and the Pensoft
Biodiversity Data Journal and ZooKeys archive. The extracted information
is expected to be used for
assessing the value of citations, for instance by considering their
position in the paper, their
co-location with other citations or their purpose.

EVALUATION
Participants will be requested to submit the LOD that their tool
produces from the evaluation dataset, as
well as a paper that describes their approach. They will also be given a
set of queries in natural language
form and will be asked to translate those queries into a SPARQL form
that works on their LOD.
The results of the queries on the produced LOD will be compared with the
expected output, and precision and
recall will be measured to identify the best performing approach.
Separately, the most original
approach will be assigned by the Program Committee.
Further details about the evaluation will be provided on the challenge wiki.

FEEDBACK AND DISCUSSION
A discussion group is open for participants to ask questions and to
receive updates about the challenge
(see link at bottom). Participants are invited to subscribe to this
group as soon as possible and to
communicate their intention to participate. They are also invited to use
this channel to discuss
problems in the input dataset and to suggest changes.

JUDGING AND PRIZES
After a first round of review, the Program Committee and the chairs will
select a number of submissions
conforming to the challenge requirements that will be invited to present
their work. Submissions
accepted for presentation will receive constructive reviews from the
Program Committee, they will be
included in the Springer LNCS post-proceedings of ESWC, and they will
have a presentation slot in a poster
session dedicated to the challenge.

In addition, the winners will present their work in a special slot of
the main program of ESWC and will be
invited to submit a revised and extended paper to a dedicated Semantic
Web Journal special issue.

Four winners will be selected. For each of the two tasks we will select:
* best performing tool, given to the paper which will get the highest
score in the evaluation
* most original approach, selected by the Challenge Committee with the
reviewing process

HOW TO PARTICIPATE
Participants are required to submit:
* Abstract: no more than 200 words.
* Description: It should explain the details of the automated annotation
system, including why the system
is innovative, how it uses Semantic Web technology, what features or
functions the system provides, what
design choices were made and what lessons were learned. The description
should also summarize how
participants have addressed the evaluation tasks. An outlook towards how
the data could be consumed is
appreciated but not strictly required. Papers must be submitted in PDF
format, following the style of the
Springer's Lecture Notes in Computer Science (LNCS) series
(http://www.springer.com/computer/lncs/lncs+authors), and not exceeding
5 pages in length.
* The Linked Open Dataset produced by their tool on the evaluation
dataset (as a file or as a URL, in Turtle or
RDF/XML).
* A set of SPARQL queries that work on that LOD and correspond to the
natural language queries provided as input

Participants will also be asked to submit their tool (source and/or
binaries, or a link these can be
downloaded from, or a web service URL) for verification purposes.
Further submission instructions will be published on the challenge wiki.

All submissions should be provided via EasyChair
https://www.easychair.org/conferences/?conf=eswc2014-challenges

MAILING LIST
We invite the potential participants to subscribe to our mailing list in
order to be kept up to date with the
latest news related to the challenge.

https://lists.sti2.org/mailman/listinfo/eswc2014-sempub-challenge

IMPORTANT DATES
* December 3, 2013: Publication of the full description of tasks, rules
and queries; publication of the
training/testing dataset
* January 15, 2014, 23:59 (Hawaii time): Deadline for making remarks to
the training/testing dataset
* January 20, 2014: Publication of the final training/testing dataset
* March 7, 2014, 23:59 (Hawaii time): Abstract Submission
* March 11, 2014: Publication of the evaluation dataset
* March 14, 2014, 23:59 (Hawaii time): Submission
* April 9, 2014, 23:59 (Hawaii time): Notification of acceptance
* May 27-29, 2014: Challenge days

CHALLENGE CHAIRS
* Angelo Di Iorio (Department of Computer Science and Engineering,
University of Bologna, IT)
* Christoph Lange (Enterprise Information Systems, University of Bonn /
Fraunhofer IAIS, DE)

PROGRAM COMMITTEE
Soren Auer (University of Bonn / Fraunhofer IAIS, DE) (supervisor)
Sarven Capadisli (University of Leipzig, DE)
Alexander Constantin (University of Manchester, UK)
Alexander Garcia Castro (Florida State University, US)
Leyla Jael Garcia Castro (Bundeswehr University of Munich, DE)
Aidan Hogan (DERI Galway, IR)
Evangelos Milios (Dalhousie University, CA)
Lyubomir Penev (Pensoft Publishers, BG)
Robert Stevens (University of Manchester, UK)
Jun Zhao (Lancaster University, UK)

We are inviting further members.

ESWC CHALLENGE COORDINATOR
* Milan Stankovic (Sepage & Universite Paris-Sorbonne, FR)

-- 
Christoph Lange, Enterprise Information Systems Department
Applied Computer Science @ University of Bonn; Fraunhofer IAIS
http://langec.wordpress.com/about, Skype duke4701

→ Semantic Publishing Challenge: Assessing the Quality of Scientific Output
  ESWC, 25–29 May 2014, Crete, Greece.  https://tinyurl.com/SPChallenge14
  Abstract submission until 7 March.