One week left: 3rd Linked Data Mining Challenge at ESWC from Heiko Paulheim on 2015-03-20 (public-lod@w3.org from March 2015)

From: Heiko Paulheim <heiko@informatik.uni-mannheim.de>
Date: Fri, 20 Mar 2015 10:45:54 +0100
To: dbpedia-discussion <dbpedia-discussion@lists.sourceforge.net>, "public-lod@w3.org community" <public-lod@w3.org>, SW-forum <semantic-web@w3.org>, rmlod@googlegroups.com
Message-ID: <550BEC52.40500@informatik.uni-mannheim.de>
[Apologies for cross-posting]

*********************************************************************

CALL FOR CHALLENGE PARTICIPATION:

The best performing solution wins an Amazon voucher worth 100€!

*********************************************************************

3rd Linked Data Mining Challenge organized in connection with the 
Know@LOD 2015 workshop at ESWC Conference (ESWC 2015), May 31 - June 4, 
2015.


Venue: Portoroz, Slovenia

Date: 01 June 2015

URL: 
http://knowalod2015.informatik.uni-mannheim.de/en/linked-data-mining-challenge/ 




*********************************************************************

IMPORTANT DATES

*********************************************************************

27 March 2015: Submission of papers and solution deadline

03 April 2015: Notification of acceptance

*********************************************************************

GENERAL OVERVIEW OF THE CHALLENGE

*********************************************************************

Linked data represents a novel type of data source that has been so far 
nearly untouched by advanced data mining methods. It breaks down many 
traditional assumptions on source data and thus represents a number of 
challenges:

- While the individual published datasets typically follow a relatively 
regular, relational-like (or hierarchical, in the case of taxonomic 
classification) structure, the presence of semantic links among them 
makes the resulting ‘hyper-dataset’ akin to general graph datasets. On 
the other hand, compared to graphs such as social networks, there is a 
larger variety of link types in the graph.

- The datasets have been published for entirely different purposes, such 
as statistical data publishing based on legal commitment of government 
bodies vs. publishing of encyclopedic data by internet volunteers vs. 
data sharing within a researcher community. This introduces further data 
modeling heterogeneity and uneven degree of completeness and reliability.

- The amount and diversity of resources as well as their link sets is 
steadily growing, which allows for inclusion of new linked datasets into 
the mining dataset nearly on the fly, at the same time, however, making 
the feature selection problem extremely hard.

The Linked Data Mining Challenge 2015 (LDMC) will consist of one task, 
which is the prediction of the review class of movies.

The best participant in the challenge will be awarded. The ranking of 
the participants will be made by the LDMC organizers, taking into 
account both the quality of the submitted LDMC paper (evaluated by 
Know@LOD workshop PC members) and the prediction quality (i.e., 
accuracy, see below).

*********************************************************************

TASK OVERVIEW

*********************************************************************

The task concerns the prediction of a review of movies, i.e., "good" and 
"bad". The initial dataset is retrieved from Metacritic, which offers an 
average rating of all time reviews for a list of movies. The ratings 
were used to divide the movies into classes, i.e., movies with score 
above 60 are regarded as "good" movies, while movies with score less 
than 40 are regarded as "bad" movies. For each movie we provide the 
corresponding DBpedia URI. The mappings can be used to extract semantic 
features from DBpedia or other LOD repositories to be exploited in the 
learning approaches proposed in the challenge.

*********************************************************************

SOURCE DATA, REQUIRED RESULTS, AND EVALUATION

*********************************************************************

The dataset is available for download here: 
http://knowalod2015.informatik.uni-mannheim.de/en/linked-data-mining-challenge/. 
It consists of training data of 1,600 instances for learning the 
predictive models (this data contains the value of the target attribute) 
and testing data of 400 instances for evaluating the models (this data 
is without the target attribute). The target attribute to be predicted 
is the Label attribute.

The datasets contain semicolon separated values: movie's title “Movie”, 
movie's release date “Release date”, movie's DBpedia URI “DBpedia_URI”, 
movie's label “Label” (only for the training set), and “id”. A sample of 
the training CSV file is as follows:

"Movie";"Release date";"DBpedia_URI";"Label";"id"

"Best Kept Secret";9/6/13 12:00 
AM;"http://dbpedia.org/resource/Best_Kept_Secret_(film) 
<http://dbpedia.org/resource/Best_Kept_Secret_%28film%29>";"good";1.0

The participants have to submit the achieved results on testing data, 
i.e. label of the movie. The results have to be delivered in a 
(syntactically correct) CSV format that includes the predicted label. 
The submitted results will be evaluated on a gold standard with respect 
to the accuracy.

Beside the CSV file containing the predictions, the participants are 
expected to submit a paper describing the used methods and techniques, 
as well as the results obtained, i.e., the hypotheses perceived as 
interesting either by the computational method or by the participants 
themselves.

The participants should provide a detailed description of their 
approach, so that it can be easily reproduced. For example, it should be 
clearly stated what are the used feature sets (and how they have been 
created), the preprocessing steps, the type of the predictor, the model 
parameters' values and tuning, etc.

The papers will be evaluated by the evaluation panel, both with respect 
to the soundness and originality of the methods used and with respect to 
the validity of the hypotheses and nuggets found. It should meet the 
standard norms of scientific writing.

*********************************************************************

ALLOWED DATASETS

*********************************************************************

For building the movie review predictor, any dataset that follows the 
Linked Open Data principles is allowed to be used.

Non-LOD datasets are allowed to be used only if the participants later 
publish those datasets in a way that would make them accessible using 
some of the standard Semantic Web technologies, e.g., RDF, SPARQL, etc.

For example, one may map the movies from the dataset to the 
corresponding movies in a non-LOD dataset X, allowing to retrieve 
additional data from the dataset X. Then, it is expected from the 
participants to publish the DBpedia mappings to the dataset X movies, 
and the additional data retrieved from the dataset X, for example, using 
RDF.

*IMPORTANT: Since the Metacritic dataset is publicly available, we 
kindly ask the participants not to use the Metacritic movies' rating 
score to tune the predictor for the movies in the test set. Any 
submission found not to comply with this rule will be disqualified.

However, other information than the movies' rating score retrieved from 
Metacritic is allowed, e.g., users' textual reviews for a given movie.

*********************************************************************

SUBMISSION PROCEDURE

*********************************************************************

Results submission

- Register using the registration web form available at: 
http://ldmc15.informatik.uni-mannheim.de/signup

- Build a prediction model on the training set.

- Apply the model on the test set to predict the label.

- Submit the results at: http://ldmc15.informatik.uni-mannheim.de/submit

- Your final score will be the one computed with respect to the last 
result submission made before Friday March 27th

-----

Paper submission

- In addition to your results, you have to submit a paper describing 
your solution

- The paper format is Springer LNCS, with a limit of four pages

- Papers are submitted online via Easychair before Friday March 27th

-----

Presentation of Results

- Challenge papers will be included in the workshop proceedings of Know@LOD

- The authors of the best performing systems will be asked to present 
their solution at the workshop

For any questions related to the submission procedure, please address 
the contact persons below.

*********************************************************************

ORGANIZATION

*********************************************************************

Petar Ristoski, University of Mannheim, Germany, petar.ristoski (at) 
informatik.uni-mannheim.de

Heiko Paulheim, University of Mannheim, Germany, heiko (at) 
informatik.uni-mannheim.de

Vojtěch Svátek, University of Economics, Prague, svatek (at) vse.cz

Václav Zeman,  University of Economics, Prague, vaclav.zeman (at) vse.cz



-- 
Prof. Dr. Heiko Paulheim
Data and Web Science Group
University of Mannheim
Phone: +49 621 181 2646
B6, 26, Room C1.08
D-68159 Mannheim

Mail:heiko@informatik.uni-mannheim.de
Web:www.heikopaulheim.com
Received on Friday, 20 March 2015 09:46:18 UTC