Entity Search Evaluation @ SEMSEARCH10 from Duc Thanh Tran on 2010-02-16 (public-lod@w3.org from February 2010)

From: Duc Thanh Tran <tran.du.th@googlemail.com>
Date: Tue, 16 Feb 2010 19:39:50 +0100
To: public-lod@w3.org, semantic-web@w3.org
Message-ID: <d6180c151002161039l1cd29dd5ra2f24d2ccc49d0f7@mail.gmail.com>
(Apologies if you receive multiple copies of this message)

Call for PARTICIPATION: Entity Search @ SEMSEARCH10

======================================================

Fellow Researcher,

for this year's SemSearch workshop to be held at WWW 2010, we are glad
to announce a special track for entity search. This is to see where we
are and to promote further research on entity retrieval on the
semantic data. Please refer to the call below for more details on this
matter.

As many people were already asking, we would like to make clear that
the participation at the entity search evaluation is not necessary for
SemSearch10. As usual, we accept any papers that address the SEMSEARCH
topics.


For news and discussions related to SemSearch and evaluation at
SemSearch, please register at
http://tech.groups.yahoo.com/group/semsearcheval/.

We are looking forward to see you at SemSearch10 in Raleigh, NC!


Cheers,

Marko Grobelnik, Jožef Stefan Institute, Ljubljana, Slovenia
Peter Mika, Yahoo! Research, Barcelona, Spain
Thanh Tran Duc, Institute AIFB, University of Karlsruhe (TH), Germany
Haofen Wang, Apex Lab, Shanghai Jiao Tong University, China.



===================================

Entity Search @ SEMSEARCH10


Third International Semantic Search Workshop SemSearch10

April 26, 2010, Raleigh, NC, USA

Homepage: http://km.aifb.uni-karlsruhe.de/ws/semsearch10#eva


Submission deadline for descriptions of Entity Search systems &
results: April 10th, 2010 (12.00 AM, GMT)


===================================

Our ultimate goal is to develop a benchmark, based on which semantic
search systems can be compared and analyzed in a systematic fashion.
Clearly, semantics can be used for different tasks (document vs. data
retrieval) and can be exploited throughout the search process (for
more usable query construction, for better matching and ranking, for
richer result presentation etc). Hence, such a benchmark shall enable
the study of different aspects of semantic search systems.

For this workshop, we will initially focus on the aspects of matching
and ranking in the semantic data search scenario. In particular, we
aim to analyze the effectiveness, efficiency and robustness of those
features of semantic search systems, which are ready to be applied to
the Web today: the capability to answer queries related to real world
entities.

The research questions we aim to tackle are:

- How well do semantic data search engines perform on the task of
Entity Search on the Web?
- What are the underlying concepts and techniques that make up the differences?

For answering these questions, we provide the following guidelines and
support for evaluating entity search systems:


-----------------------------------
Queries
-----------------------------------

We provide a set of queries that are focused on the task of entity
search. These queries represent a sample extracted from the Yahoo Web
search query log. Every query is a plain list of keywords. One example
of this type is "Semantic Search workshop 2010 WWW", which retrieves
resources that are representations of or related to the current
Semantic Search workshop. More sample queries can be downloaded from
this link:

[TODO: provide a link].

Access to the evaluation set of queries and thus participation in the
evaluation requires the signing of a license agreement.

[TODO: provide a link].

To avoid the effect of ad-hoc optimization, we will make the final
queries used for the evaluation available to participants only shortly
before the submission deadline.

-----------------------------------
Data
-----------------------------------

We provide a corpus of datasets, which contain entity descriptions in
the form of RDF. They represent a sample of Web data crawled from
publicly available sources. For this evaluation, we use the Billion
Triple Challenge 2009 dataset.
Further information and detailed statistics can be found here:

http://vmlion25.deri.ie/

The original Billion Triple Challenge 2009 dataset contains blank
nodes. We will not deal with blank nodes in this evaluation and thus
require participants to encode blank nodes according to the following
rule: BNID map to http://example.org/URLEncode(BNID), where BNID is
the blank node id. Since the blank node ids in that dataset are
unique, this convention is sufficient to map blank nodes to obtain
distinct URIs.

Instead of encoding the blank nodes using this convention,
participants can also download the following version of the Billion
Triple Challenge 2009 dataset where blank nodes are have been already
converted to URIs:

http://km.aifb.uni-karlsruhe.de/ws/dataset_semsearch2010/000-CONTENTS


-----------------------------------
Relevance Judgment
-----------------------------------


The search systems produce lists of at most 10 resources ordered by
relevance. These results have to be drawn from data in the corpus.
Results will be evaluated on the three-point scale (0) Not Relevant,
(1) Relevant and (3) Perfect Match. A perfect match is a description
of a resource that matches the entity to be retrieved by the query. A
relevant result is a resource description that is related to the
entity to be retrieved by the query, i.e. the entity is contained in
the description of that result. Otherwise, a resource description is
not relevant.

In the current evaluation we only assess individual results and as
they are found in the original data set. We do not assess the
potential of semantic search systems for disambiguating and merging
resources.  In other words, only resources appearing in the original
data set may be returned as results.


-----------------------------------
Evaluation Process
-----------------------------------

For participating, each system will have to run the provided queries
on the corpus.
The retrieved results have to be submitted in one file following the
TREC format:

http://www.ir.iit.edu/~dagr/cs529/files/project_files/trec_eval_desc.htm

Please verify that your result file can be read with the TREC
evaluation tool available at:

http://trec.nist.gov/trec_eval/index.html

The assessment of the results will be performed manually using Amazon
Mechanical Turk.

Based on the relevance judgments, recall, precision, f-measure and the
mean average precision will be computed, and used as the basis for
comparing search systems' performance.

Given permission of the participants, results of the assessment and
the evaluation feedbacks will be made publicly available at the
workshop's website.


-----------------------------------
Submission and Proceedings
-----------------------------------

For the Entity Search Track at SemSearch, participants
can submit


- (1) A short system description papers (April 10th): up to 5 pages in
ACM format
This submission is optional and will be considered for the proceeding.
Participants can register at the workshop and ask for a presentation
slot without having submitted such a system description paper.
Submissions must be formatted using the WWW2010 templates available at
 http://www2010.org/www/authors/submissions/formatting-guidelines/.

- (2) Evaluation results (April 10th): results in TREC format

Please use the following link to the submission system to submit your paper:
Easychair Submission System for SemSearch10 at
http://www.easychair.org/conferences/?conf=semsearch10

For standard paper and system descriptions, the system accepts PDF.
The evaluation results should be uploaded as TXT.


-----------------------------------
Important Dates
-----------------------------------

Deadline for standard paper submissions: March 6th, 2010 (12.00 AM, GMT)

Notification of acceptance standard papers: March 28th, 2010

Deadline for optional Entity Search system description submissions:
April 10th, 2010 (12.00 AM, GMT)

Deadline for Entity Search Evaluation results: April 10th, 2010 (12.00 AM, GMT)

Camera-ready versions of standard papers: April 6nd, 2010

Notification of acceptance for Entity Search system papers: April 18th, 2010

Camera-ready versions of Entity Search system papers: April 24th, 2010

WWW'10 Conference: April 26th-30th, 2010


Workshop Day: April 26th, 2010

-----------------------------------
Contact
-----------------------------------
For news and discussions related to SemSearch and Evaluation at
SemSearch, please register at
http://tech.groups.yahoo.com/group/semsearcheval/.
The organization committee can be reached using contact data available
at their web pages (or semsearch10@easychair.org).
See website http://km.aifb.uni-karlsruhe.de/ws/semsearch10.
Received on Tuesday, 16 February 2010 18:40:25 UTC