Disentangling the notion of SPARQL dataset from Daniel Hernández on 2015-05-18 (public-sparql-dev@w3.org from April to June 2015)

From: Daniel Hernández <daniel@degu.cl>
Date: Mon, 18 May 2015 16:46:52 -0300
To: public-sparql-dev@w3.org
Message-ID: <1431978412.1553.8.camel@ruil.local>

In the context of the 9th Mendelzon Workshop on Foundations of Data
Management (AMW), in the paper "Disentangling the notion of SPARQL
dataset" we analyzed the notion of dataset. We considered it relevant
to let this list know about it.

After carefully analyzing the official specifications, this notion
seems to be neither simple nor clear. When asked  people about the
notion of dataset  one frequently gets answers that differ from the
one stated in the specification. Similarly occurs with most current
implementations. Additionally, some ideas seems contradictory.

In practice, most engines only support references to names of graphs
that are in the default dataset. Thus, as blanks are scoped to
the default dataset, the merge operator is not used when no data is
retrieved from the Web. Else, the use merge (as it is stated in the
SPARQL specification) is not sufficient, because it does not avoid the
possibility of blank node clashes (the same blank node label used in
different scopes) and it is a source of identity duplication (a blank
node copied from one scope to another scope with two different
labels). Most details on this can be found in the paper and in the
slides below. We hope that they could be useful to this community.

Paper:
http://users.dcc.uchile.cl/~dhernand/research/amw-2015-dataset.pdf

Slides:
http://users.dcc.uchile.cl/~dhernand/research/amw-2015-dataset-slides.pdf

regards,
Daniel Hernandez
Claudio Gutierrez

Received on Monday, 18 May 2015 20:01:59 UTC