CFP: First International Workshop on Semantic Statistics (SemStats 2013) + challenge @ ISWC 2013 from Raphaël Troncy on 2013-07-01 (public-gld-wg@w3.org from July 2013)

From: Raphaël Troncy <raphael.troncy@eurecom.fr>
Date: Mon, 01 Jul 2013 08:47:04 +0200
To: public-gld-wg@w3.org
Message-ID: <51D125E8.4090605@eurecom.fr>
============================================================================= 

First International Workshop on Semantic Statistics (SemStats 2013)

Full-Day Workshop in conjunction with ISWC 2013, the 12th International 
Semantic Web Conference
21-25 October 2013, in Sydney, Australia

Workshop Web Site: http://www.datalift.org/en/event/semstats2013
Challenge Web Site: http://datalift.org/en/event/semstats2013/challenge
EasyChair: http://www.easychair.org/conferences/?conf=semstats2013
E-mail address: semstats2013@easychair.org
Twitter Hashtag: #semstats2013

*Important Dates (regular papers)*
- Deadline for paper submission: Friday, 12 July 2013, 23:59 (Hawaii time)
- Notification of acceptance/rejection: Friday, 9 August 2013
- Deadline for camera-ready version: Friday, 30 August 2013

*Important Dates (challenge papers)*
- Deadline for paper submission: Friday, 6 September 2013, 23:59 (Hawaii 
time)
- Short list announcement: Monday, 23 September 2013

============================================================================= 


*Workshop Summary*
The goal of this workshop is to explore and strengthen the relationship 
between the Semantic Web and statistical communities, to provide better 
access to the data held by statistical offices. It will focus on ways in 
which statisticians can use Semantic Web technologies and standards in 
order to formalize, publish, document and link their data and metadata.

The statistical community has recently shown an interest in the Semantic 
Web. In particular, initiatives have been launched to develop semantic 
vocabularies representing statistical classifications and discovery 
metadata. Tools are also being created by statistical organizations to 
support the publication of dimensional data conforming to the Data Cube 
specification, now in Last Call at W3C. But statisticians see challenges 
in the Semantic Web: how can data and concepts be linked in a 
statistically rigorous fashion? How can we avoid fuzzy semantics leading 
to wrong analyses? How can we preserve data confidentiality?

The workshop will also cover the question of how to apply statistical 
methods or treatments to linked data, and how to develop new methods and 
tools for this purpose. Except for visualisation techniques and tools, 
this question is relatively unexplored, but the subject will obviously 
grow in importance in the near future.

*Motivation*
There is a growing interest regarding linked data and the Semantic Web 
in the statistical community. A large amount of statistical data from 
international and national agencies has already been published on the 
web of data, for example Census data from the U.S., Spain or France 
amongst others. In most cases, though, this publication is done by 
people exterior to the statistical office (see also 
http://datahub.io/dataset/istat-immigration, http://270a.info/ or 
http://eurostat.linked-statistics.org/), which raises issues such as 
long-term URI persistence, institutional commitment and data maintenance.

Statistical organizations also possess an important corpus of structural 
metadata such as concept schemes, thesauri, code lists and 
classifications. Some of those are already available as linked data, 
generally in SKOS format (e.g. FAO's Agrovoc or UN's COFOG). Semantic 
web standards useful for the statisticians have now arrived at maturity. 
The best examples are the W3C Data Cube, DCAT and ADMS vocabularies. The 
statistical community is also working on the definition of more 
specialized vocabularies, especially under the umbrella of the DDI 
Alliance. For example, XKOS extends SKOS for the representation of 
statistical classifications, and Disco defines a vocabulary for data 
documentation and discovery; and the Visual Analytics Vocabulary is a 
first step towards semantic descriptions for user interface components 
developed to visualize Linked Statistical Data which can lead to 
increased linked data consumption and accessibility. We are now at the 
tipping point where the statistical and the Semantic Web communities 
have to formally exchange in order to share experiences and tools and 
think ahead regarding the upcoming challenges.

The web of data will benefit in getting rich data published by 
professional and trustworthy data providers. It is also important that 
metadata maintained by statistical offices like concept schemes of 
economic or societal terms, statistical classifications, well-known 
codes, etc., are available as linked data, because they are of good 
quality, well-maintained, and they constitute a corpus to which a lot of 
other data can refer to.

Statisticians have a long-going culture of data integrity, quality and 
documentation. They have developed industrialized data production and 
publication processes, and they care about data confidentiality and more 
generally how data can be used. It seems that after a period where the 
aim was to publish as many triples as possible, the focus of the 
Semantic Web community is now shifting to having a better quality of 
data and metadata, more coherent vocabularies (see the LOV initiative), 
good and documented naming patterns, etc. This workshop aims to 
contribute in these longer term problems in order to have a significant 
impact.

The statistics community faces sometimes challenges when trying to adopt 
Semantic Web technologies, in particular:
   * difficulty to create and publish linked data: this can be 
alleviated by providing methods, tools, lessons learned and best 
practices, by publicizing successful examples and by providing support.
   * difficulty to see the purpose of publishing linked data: we must 
develop end-user tools leveraging statistical linked data, provide 
convincing examples of real use in applications or mashups, so that the 
end-user value of statistical linked data and metadata appears more clearly.
   * difficulty to use external linked data in their daily activity: it 
is important do develop statistical methods and tools especially 
tailored for linked data, so that statisticians can get accustomed to 
using them and get convinced of their specific utility.

To conclude, statisticians know how misleading it can be to exploit 
semantic connections without carefully considering and weighing 
information about the quality of these connections, the validity of 
inferences, etc. A challenge for them is to determine, to ensure and to 
inform consumers about the quality of semantic connections which may be 
used to support analysis in some circumstances but not others. The 
workshop will enable participants to discuss these very important issues.

*Topics*
The workshop will address topics related to statistics and linked data. 
This includes but is not limited to:

How to publish linked statistics?
   * What are the relevant vocabularies for the publication of 
statistical data?
   * What are the relevant vocabularies for the publication of 
statistical metadata (code lists and classifications, descriptive 
metadata, provenance and quality information, etc.)?
   * What are the existing tools? Can the usual statistical software 
packages (e.g. R, SAS, Stata) do the job?
   * How do we include linked data production and publication in the 
data lifecycle?
   * How do we establish, document and share best practices?

How to use linked data for statistics?
   * Where and how can we find statistics data: data catalogues, dataset 
descriptions, data discovery?
   * How do we assess data quality (collection methodology, 
traceability, etc.)?
   * How can we perform data reconciliation, ontology matching and 
instance matching with statistics data?
   * How can we apply statistical processes on linked data: data 
analysis, descriptive statistics, estimation, correction, visualization, 
etc.?

*Submissions*
This full-day workshop is aimed at an interdisciplinary audience of 
researchers and practitioners involved or interested in Statistics and 
the Semantic Web. All papers must represent original and unpublished 
work that is not currently under review. Papers will be evaluated 
according to their significance, originality, technical content, style, 
clarity, and relevance to the workshop. At least one author of each 
accepted paper is expected to attend the workshop.

Workshop participation is available to ISWC 2013 attendants at an 
additional cost, see 
http://iswc2013.semanticweb.org/content/registration for details.

The workshop will also feature a challenge based on Census Data 
published on the web or provided by Statistical Institutes. It is 
expected that data from Australia, France, Ireland, the U.S. and Spain 
at least will be available. The challenge will consist in the 
realization of mashups or visualizations, but also on comparisons, 
alignment and enrichment of the data and concepts involved. A reward 
will be attributed to the challenge winner. More details will be 
available soon at http://www.datalift.org/event/semstats2013.

We welcome the following types of contributions:
   * Full research papers (up to 12 pages)
   * Short papers (up to 6 pages)
   * Challenge papers (up to 6 pages)

All submissions must be written in English and must be formatted 
according to the information for LNCS Authors (see 
http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0). Please, 
note that (X)HTML(+RDFa) submissions are also welcome as soon as the 
layout complies with the LNCS style. Authors can for example use the 
template provided at https://github.com/csarven/linked-research. 
Submissions are NOT anonymous. Please submit your contributions 
electronically in *PDF* format at 
http://www.easychair.org/conferences/?conf=semstats2013 and before July 
12, 2013, 23:59 PM Hawaii Time. All accepted papers will be archived in 
an electronic proceedings published by CEUR-WS.org. If you are 
interested in submitting a paper but would like more preliminary 
information, please contact semstats2013@easychair.org.

*Challenge*
Participants are invited to apply statistical techniques and semantic 
web technologies to a collection of datasets provided by the organizers. 
The data provided is census data describing the population by geographic 
zone, age group, sex and current activity status. The datasets and 
complete data structure definitions are delivered in the Data Cube 
format. The data covers Australia and France. References of comparable 
data from other countries are also provided, but strict equivalence of 
concepts or conformity to the challenge DSD will not be guaranteed for 
those data sets.

Entries should focus on demonstrating new potentialities allowed by 
semantic webs models and technologies. This can be done for example by 
linking two or more datasets provided in the challenge together (e.g. 
international comparisons) and/or linking one or more challenge datasets 
to other data sources.
More information at http://datalift.org/en/event/semstats2013/challenge

*Chairs*
Franck Cotton, INSEE, France
Richard Cyganiak, DERI, Ireland
Armin Haller, CSIRO, Australia
Alistair Hamilton, ABS, Australia
Raphaël Troncy, EURECOM, France

*Program Committee*
Phil Archer, W3C, UK
Ghislain Atemezing, EURECOM, France
Sarven Capadisli, University of Leipzig, Germany
Ric Clarke, Australian Bureau of Statistics, Australia
Jay Devlin, Statistices New Zealand, New Zealand
Miguel Expósito, Instituto Cántabro de Estadística, Spain
Dan Gillman, U.S. Bureau of Labor Statistics, USA
Alberto González Yanes, ISTAC, Spain
Arofan Gregory, Open Data Foundation, United States
Tudor Groza, The University of Queensland, Australia
Christophe Guéret, Data Archiving and Networked Services (DANS), The 
Netherlands
Andreas Harth, Karlsruhe Institute of Technology, Germany
Yves Jacques, FAO, Italy
Laurent Lefort, CSIRO, Australia
Marco Pellegrino, Eurostat, Luxembourg
Dave Reynolds, Epimorphics, UK
Monica Scannapieco, Istat, Italy
François Scharffe, LIRMM, University of Montpellier, France
Hideaki Takeda, National Institute of Informatics, Japan
Wendy Thomas, University of Minnesota, United States
Bernard Vatant, Mondeca, France
Boris Villazon-Terrazas, iSOCO, Spain
Joachim Wackerow, GESIS, Germany
Stuart Williams, Epimorphics, UK

-- 
Raphaël Troncy
EURECOM, Campus SophiaTech
Multimedia Communications Department
450 route des Chappes, 06410 Biot, France.
e-mail: raphael.troncy@eurecom.fr & raphael.troncy@gmail.com
Tel: +33 (0)4 - 9300 8242
Fax: +33 (0)4 - 9000 8200
Web: http://www.eurecom.fr/~troncy/
Received on Monday, 1 July 2013 06:47:33 UTC