Re: Dataset Semantics from Eric Prud'hommeaux on 2012-09-22 (public-rdf-wg@w3.org from September 2012)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Fri, 21 Sep 2012 21:06:14 -0400
To: Pat Hayes <phayes@ihmc.us>
Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>, Gregg Kellogg <gregg@greggkellogg.net>, RDF WG <public-rdf-wg@w3.org>
Message-ID: <CANfjZH3cyRTMkzqwzmukwYXcJfX4VR4M+HLBrMk_O-SSqJrfZQ@mail.gmail.com>
On Thu, Sep 20, 2012 at 10:48 AM, Pat Hayes <phayes@ihmc.us> wrote:
>
> On Sep 20, 2012, at 4:19 AM, Antoine Zimmermann wrote:
>
>> Gregg,
>>
>> I tend to see the default graph in the same way, but there are other people who don't, and we don't want to break their applications.
>> In any case, the goal of the "minimal" semantics is not to have something that will solve the use cases for free, by simply running a reasoner over well chosen data. Nor is it something that would block your application from doing stupid thing.
>> The goal, the way I see it, is to define a semantics that satisfies the requirements that we all agree on, such that if a conclusion is drawn from a dataset by applying the semantics, no one would object to that conclusion.
>> In my opinion, the absolute minimal requirements are:
>>
>> 1. if G graph-entail G' then,
>>   { G } dataset-entail { G' }
>> 2. if G graph-entail G' then for any IRI n,
>>   <n> { G } dataset-entails <n> { G' }
>> 3. a dataset D entails any of its subsets
>>
>> I think this is pretty much what Peter said he would agree with.
>> Given these requirements, our objective has been to formalise it in term of model theory. Unfortunately so far, each proposed formalisation was implying something extra that was not in the requirements above.
>>
>> The proposal that I made in my last email to Peter [1] has only one extra consequence: that the inconsistency of the default graph implies the inconsistency of the dataset.
>
> Suppose we simply say that {G, N} ds-entails {G' N'} exactly when: G entails G' and for all <n, g'> in N' there is a <n, g> in N with g entails g'. (Same n, note.) This covers the three conditions above, and it does not imply ds-entailment simply from inconsistency of the default graph alone.  (This relies on the idea that a missing graph is understood to be the empty graph, but I think we all assume this, right?)

Hmm, there are two implementation strategies here, one where the
CREATE (think unix touch) matters, and one where it doesn't. The spec
doesn't push folks one way or the other, using text like

[[
Stores that do not record empty named graphs will always return
success on creation of a non-existing graph.
]] - http://www.w3.org/TR/sparql11-update/#create

Could the folks who care about empty graphs use something like:

"... there is a <n, g> in N with _g and g' being empty graphs or_ g entails g'"

Is it nuts to try to support both universes?


> This kind of finesses the task of giving a model theory for datasets, so its not exactly a dataset semantics, but it might be enough for our purposes. It is monotonic in the usual senses, eg adding graphs to a dataset does not block any entailments. We could describe it as a constraint on any stronger (and genuine model-theoretic) semantics.
>
> Comments?
>
> Pat
>
>
>> This can either be considered as a bug, and we should fix it (or give up), or we manage to agree that it is a feature, and it can be added to the absolute minimal requirements, and we can declare victory.
>>
>>
>> Then, when this is set up, all kinds of extensions (or vocabularies, or use case-specific implementations) can be invented to cover specifically certain use cases. For instance, define a temporal extension where triples in named graphs corresponding to disjoint time frame would not interfere (that's the minimal semantics) but /additionally/, certain inferences occur between overlapping time frame (that's the extension).
>>
>>
>> [1] Really minimal dataset semantics. http://lists.w3.org/Archives/Public/public-rdf-wg/2012Sep/0210.html
>>
>>
>> AZ
>>
>> Le 19/09/2012 22:42, Gregg Kellogg a écrit :
>>> I step into this debate, not with any great understanding of the
>>> details, but with some expectations as a developer and as an
>>> implementer of RDF frameworks.
>>>
>>> Part of the problem that I see in the WG dynamics is that there are a
>>> number of different ways in which things like a default graph might
>>> be used. As a developer, the lack of guidance by this group (past and
>>> present) has lead to confusion IMO. This is also true of the SPARQL
>>> WG, in that implementations are free to provide different
>>> implementations of a default graph: the union of all named graphs, a
>>> separate unrelated graph not having a name, or as the default
>>> location for meta-data about named graphs themselves. I think this
>>> situation is brought about principally because of the lack of
>>> guidance these groups have given as to what the use of these features
>>> is intended to be.
>>>
>>> To not provide guidance now, after there is some experience in
>>> implementation, is, I think, a missed opportunity.
>>>
>>> Speaking as an implementer, I expect to be able to use a dataset
>>> without advance knowledge of how the data is organized. This requires
>>> that there is some meta-data that can be used to understand things
>>> like entailment regimes and the "meaning" of graph names. The SPARQL
>>> Service Description is a natural format for describe this, but there
>>> is no default binding to a dataset itself; I think there should be. I
>>> my usage, this is typically the default graph, but it could be some
>>> other "named" graph; however, if it is named, there doesn't seem to
>>> be a way to find it unless there is some normative language for how
>>> the dataset description is named within a dataset. I think it is most
>>> natural for this to be the default dataset, or that there is a
>>> relation defined within the default dataset which names the dataset
>>> description.
>>>
>>> IMO, the default graph should be used for metadata about the dataset,
>>> including, but not limited to, the SPARQL Service Description. I also
>>> believe that I should be able to use information in that service
>>> description to reason about the named graphs themselves.
>>>
>>> As an example use case, I might load information from a particular
>>> Wiki page into a graph named with the URL of the page, along with
>>> query parameters indicating a particular version of that page
>>> (clearly, the format of these URLs is arbitrary, but the way to
>>> describe them should be normative). If the page changes, I would
>>> likely load the data into a new named graph. I'd like to be able to
>>> use information in the dataset description to identify the most
>>> current version of the named graphs for a particular page, and
>>> potentially the named graphs for a collection of these pages (all
>>> pages from a current wiki, for example) at a specific time. I can
>>> imagine a system in which these graphs could be described using a
>>> vocabulary that allowed me to construct consistent SPARQL queries
>>> across these named graphs, but only if the location and semantics of
>>> this information can be determined without built-in knowledge of the
>>> dataset semantics. Perhaps this is too ambitious, but I believe that
>>> this is where we should be going in the long run.
>>>
>>> Gregg Kellogg gregg@greggkellogg.net
>>
>> --
>> Antoine Zimmermann
>> ISCOD / LSTI - Institut Henri Fayol
>> École Nationale Supérieure des Mines de Saint-Étienne
>> 158 cours Fauriel
>> 42023 Saint-Étienne Cedex 2
>> France
>> Tél:+33(0)4 77 42 83 36
>> Fax:+33(0)4 77 42 66 66
>> http://zimmer.aprilfoolsreview.com/
>>
>>
>
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>



-- 
-ericP
office: +1.617.258.5741
mobile: +1.617.599.3509
Received on Saturday, 22 September 2012 01:06:43 UTC