W3C home > Mailing lists > Public > public-rdf-wg@w3.org > September 2012

Re: Dataset Semantics

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Thu, 20 Sep 2012 17:12:11 +0200
Message-ID: <505B324B.9000205@emse.fr>
To: Pat Hayes <phayes@ihmc.us>
CC: Gregg Kellogg <gregg@greggkellogg.net>, RDF WG <public-rdf-wg@w3.org>
Le 20/09/2012 16:48, Pat Hayes a écrit :
> On Sep 20, 2012, at 4:19 AM, Antoine Zimmermann wrote:
>> Gregg,
>> I tend to see the default graph in the same way, but there are
>> other people who don't, and we don't want to break their
>> applications. In any case, the goal of the "minimal" semantics is
>> not to have something that will solve the use cases for free, by
>> simply running a reasoner over well chosen data. Nor is it
>> something that would block your application from doing stupid
>> thing. The goal, the way I see it, is to define a semantics that
>> satisfies the requirements that we all agree on, such that if a
>> conclusion is drawn from a dataset by applying the semantics, no
>> one would object to that conclusion. In my opinion, the absolute
>> minimal requirements are:
>> 1. if G graph-entail G' then, { G } dataset-entail { G' } 2. if G
>> graph-entail G' then for any IRI n, <n>  { G } dataset-entails<n>
>> { G' } 3. a dataset D entails any of its subsets
>> I think this is pretty much what Peter said he would agree with.
>> Given these requirements, our objective has been to formalise it in
>> term of model theory. Unfortunately so far, each proposed
>> formalisation was implying something extra that was not in the
>> requirements above.
>> The proposal that I made in my last email to Peter [1] has only one
>> extra consequence: that the inconsistency of the default graph
>> implies the inconsistency of the dataset.
> Suppose we simply say that {G, N} ds-entails {G' N'} exactly when: G
> entails G' and for all<n, g'>  in N' there is a<n, g>  in N with g
> entails g'. (Same n, note.) This covers the three conditions above,
> and it does not imply ds-entailment simply from inconsistency of the
> default graph alone.  (This relies on the idea that a missing graph
> is understood to be the empty graph, but I think we all assume this,
> right?)
> This kind of finesses the task of giving a model theory for datasets,
> so its not exactly a dataset semantics, but it might be enough for
> our purposes. It is monotonic in the usual senses, eg adding graphs
> to a dataset does not block any entailments. We could describe it as
> a constraint on any stronger (and genuine model-theoretic)
> semantics.
> Comments?

That would be what I would accept in the last resort, if we cannot get 
an agreemeent on a model theoretic semantics. At least it satisfies the 

> Pat
>> This can either be considered as a bug, and we should fix it (or
>> give up), or we manage to agree that it is a feature, and it can be
>> added to the absolute minimal requirements, and we can declare
>> victory.
>> Then, when this is set up, all kinds of extensions (or
>> vocabularies, or use case-specific implementations) can be invented
>> to cover specifically certain use cases. For instance, define a
>> temporal extension where triples in named graphs corresponding to
>> disjoint time frame would not interfere (that's the minimal
>> semantics) but /additionally/, certain inferences occur between
>> overlapping time frame (that's the extension).
>> [1] Really minimal dataset semantics.
>> http://lists.w3.org/Archives/Public/public-rdf-wg/2012Sep/0210.html
>> Le 19/09/2012 22:42, Gregg Kellogg a écrit :
>>> I step into this debate, not with any great understanding of the
>>> details, but with some expectations as a developer and as an
>>> implementer of RDF frameworks.
>>> Part of the problem that I see in the WG dynamics is that there
>>> are a number of different ways in which things like a default
>>> graph might be used. As a developer, the lack of guidance by this
>>> group (past and present) has lead to confusion IMO. This is also
>>> true of the SPARQL WG, in that implementations are free to
>>> provide different implementations of a default graph: the union
>>> of all named graphs, a separate unrelated graph not having a
>>> name, or as the default location for meta-data about named graphs
>>> themselves. I think this situation is brought about principally
>>> because of the lack of guidance these groups have given as to
>>> what the use of these features is intended to be.
>>> To not provide guidance now, after there is some experience in
>>> implementation, is, I think, a missed opportunity.
>>> Speaking as an implementer, I expect to be able to use a dataset
>>> without advance knowledge of how the data is organized. This
>>> requires that there is some meta-data that can be used to
>>> understand things like entailment regimes and the "meaning" of
>>> graph names. The SPARQL Service Description is a natural format
>>> for describe this, but there is no default binding to a dataset
>>> itself; I think there should be. I my usage, this is typically
>>> the default graph, but it could be some other "named" graph;
>>> however, if it is named, there doesn't seem to be a way to find
>>> it unless there is some normative language for how the dataset
>>> description is named within a dataset. I think it is most natural
>>> for this to be the default dataset, or that there is a relation
>>> defined within the default dataset which names the dataset
>>> description.
>>> IMO, the default graph should be used for metadata about the
>>> dataset, including, but not limited to, the SPARQL Service
>>> Description. I also believe that I should be able to use
>>> information in that service description to reason about the named
>>> graphs themselves.
>>> As an example use case, I might load information from a
>>> particular Wiki page into a graph named with the URL of the page,
>>> along with query parameters indicating a particular version of
>>> that page (clearly, the format of these URLs is arbitrary, but
>>> the way to describe them should be normative). If the page
>>> changes, I would likely load the data into a new named graph. I'd
>>> like to be able to use information in the dataset description to
>>> identify the most current version of the named graphs for a
>>> particular page, and potentially the named graphs for a
>>> collection of these pages (all pages from a current wiki, for
>>> example) at a specific time. I can imagine a system in which
>>> these graphs could be described using a vocabulary that allowed
>>> me to construct consistent SPARQL queries across these named
>>> graphs, but only if the location and semantics of this
>>> information can be determined without built-in knowledge of the
>>> dataset semantics. Perhaps this is too ambitious, but I believe
>>> that this is where we should be going in the long run.
>>> Gregg Kellogg gregg@greggkellogg.net
>> -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol École
>> Nationale Supérieure des Mines de Saint-Étienne 158 cours Fauriel
>> 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 83 36
>> Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/
> ------------------------------------------------------------ IHMC
> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
> (850)202 4416   office Pensacola                            (850)202
> 4440   fax FL 32502                              (850)291 0667
> mobile phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
Received on Thursday, 20 September 2012 15:12:47 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:51 GMT