Re: Dataset Semantics from Antoine Zimmermann on 2012-09-20 (public-rdf-wg@w3.org from September 2012)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Thu, 20 Sep 2012 11:19:30 +0200
To: Gregg Kellogg <gregg@greggkellogg.net>
CC: RDF WG <public-rdf-wg@w3.org>
Message-ID: <505ADFA2.9060107@emse.fr>
Gregg,

I tend to see the default graph in the same way, but there are other 
people who don't, and we don't want to break their applications.
In any case, the goal of the "minimal" semantics is not to have 
something that will solve the use cases for free, by simply running a 
reasoner over well chosen data. Nor is it something that would block 
your application from doing stupid thing.
The goal, the way I see it, is to define a semantics that satisfies the 
requirements that we all agree on, such that if a conclusion is drawn 
from a dataset by applying the semantics, no one would object to that 
conclusion.
In my opinion, the absolute minimal requirements are:

1. if G graph-entail G' then,
    { G } dataset-entail { G' }
2. if G graph-entail G' then for any IRI n,
    <n> { G } dataset-entails <n> { G' }
3. a dataset D entails any of its subsets

I think this is pretty much what Peter said he would agree with.
Given these requirements, our objective has been to formalise it in term 
of model theory. Unfortunately so far, each proposed formalisation was 
implying something extra that was not in the requirements above.

The proposal that I made in my last email to Peter [1] has only one 
extra consequence: that the inconsistency of the default graph implies 
the inconsistency of the dataset. This can either be considered as a 
bug, and we should fix it (or give up), or we manage to agree that it is 
a feature, and it can be added to the absolute minimal requirements, and 
we can declare victory.


Then, when this is set up, all kinds of extensions (or vocabularies, or 
use case-specific implementations) can be invented to cover specifically 
certain use cases. For instance, define a temporal extension where 
triples in named graphs corresponding to disjoint time frame would not 
interfere (that's the minimal semantics) but /additionally/, certain 
inferences occur between overlapping time frame (that's the extension).


[1] Really minimal dataset semantics. 
http://lists.w3.org/Archives/Public/public-rdf-wg/2012Sep/0210.html


AZ

Le 19/09/2012 22:42, Gregg Kellogg a écrit :
> I step into this debate, not with any great understanding of the
> details, but with some expectations as a developer and as an
> implementer of RDF frameworks.
>
> Part of the problem that I see in the WG dynamics is that there are a
> number of different ways in which things like a default graph might
> be used. As a developer, the lack of guidance by this group (past and
> present) has lead to confusion IMO. This is also true of the SPARQL
> WG, in that implementations are free to provide different
> implementations of a default graph: the union of all named graphs, a
> separate unrelated graph not having a name, or as the default
> location for meta-data about named graphs themselves. I think this
> situation is brought about principally because of the lack of
> guidance these groups have given as to what the use of these features
> is intended to be.
>
> To not provide guidance now, after there is some experience in
> implementation, is, I think, a missed opportunity.
>
> Speaking as an implementer, I expect to be able to use a dataset
> without advance knowledge of how the data is organized. This requires
> that there is some meta-data that can be used to understand things
> like entailment regimes and the "meaning" of graph names. The SPARQL
> Service Description is a natural format for describe this, but there
> is no default binding to a dataset itself; I think there should be. I
> my usage, this is typically the default graph, but it could be some
> other "named" graph; however, if it is named, there doesn't seem to
> be a way to find it unless there is some normative language for how
> the dataset description is named within a dataset. I think it is most
> natural for this to be the default dataset, or that there is a
> relation defined within the default dataset which names the dataset
> description.
>
> IMO, the default graph should be used for metadata about the dataset,
> including, but not limited to, the SPARQL Service Description. I also
> believe that I should be able to use information in that service
> description to reason about the named graphs themselves.
>
> As an example use case, I might load information from a particular
> Wiki page into a graph named with the URL of the page, along with
> query parameters indicating a particular version of that page
> (clearly, the format of these URLs is arbitrary, but the way to
> describe them should be normative). If the page changes, I would
> likely load the data into a new named graph. I'd like to be able to
> use information in the dataset description to identify the most
> current version of the named graphs for a particular page, and
> potentially the named graphs for a collection of these pages (all
> pages from a current wiki, for example) at a specific time. I can
> imagine a system in which these graphs could be described using a
> vocabulary that allowed me to construct consistent SPARQL queries
> across these named graphs, but only if the location and semantics of
> this information can be determined without built-in knowledge of the
> dataset semantics. Perhaps this is too ambitious, but I believe that
> this is where we should be going in the long run.
>
> Gregg Kellogg gregg@greggkellogg.net

-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 83 36
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Thursday, 20 September 2012 09:19:44 UTC