Re: three kinds of dataset

From: Pat Hayes <phayes@ihmc.us>
Subject: three kinds of dataset
Date: Tue, 6 Mar 2012 00:55:58 -0600

> Ive been trying to pull all these threads together. Seems to me that the
> use cases for quads/datasets fall into three main categories, which
> demand different semantic approaches if we are going to try to avoid
> interoperability confusion. (Now I understand Antoine's proposal I see
> how it manages to be a kind of weakest-possible-blanket-case which
> allows something like all three of these to kind of work, but I would
> argue that we can do better, because this fit-all approach doesn't
> really fit anything quite properly. More below.)
> 
> Case 1. Datasets are collections of RDF graphs distinguished from one
> another by 'labels', used essentially as a bookkeeping device to
> distinguish one graph from another, to keep entailments from one graph
> distinguished from those of another, etc..  No actual semantic
> relationship is assumed to hold between a graph and its label, and each
> graph is a normal RDF graph to which the 2004 RDF semantics
> applies. There is no difference in meaning between a labelled graph and
> the same graph outside the dataset, without the label. No particular
> meaning is given to the idea of 'asserting' a dataset.

This case appears to not have any semantic implications.  Graphs are
still graphs, entailment is still between graphs, etc., etc.  There are
no semantics for RDF datasets, which are just some data structure to be
used by applications as they see fit.

> Case 2. The graph labels in a dataset are presumed to indicate a context
> of some kind in which the labeled graph is understood to be be true or
> to hold. To assert the dataset is to assert that each graph holds *in
> its context* but it may not do so outside the context, so no semantic
> relationship, eg of entailment, holds between the named graphs in a
> dataset and any unnamed graphs, even the same graph without its
> label. (Contexts might include timeperiods, locations, beliefs, sources,
> "Islands", etc..: anything which is thought of as influencing the truth
> of something expressed in RDF.)

Here there is some notion of asserting a dataset.  What, then is the
theory involved?  Presumably the default graph trples are just asserted,
but what happens for named graphs?  The RDF semantics would have to be
expanded to include RDF graphs.  What consequences do we want in this
semantics?  How is this semantics going to interact with the various
RDFS or OWL constructs?

> Case 3. The graph labels in a dataset are understood to be actual names
> of the graph they are associated with, ie to formally denote the graph,
> so that when used in RDF these labels refer to the actual graph. (Or
> maybe, to some larger graph of which the graph indicated is a part.) The
> labelled graphs are then essentially being mentioned rather than used,
> so that the dataset can be asserted without in any way asserting the
> component named graphs it contains. These named graphs are more like
> graph literals than a graph in an RDF graph document.  (There is also
> the idea that the label actually names a graph container whose state is
> initially the graph shown, and no doubt other variations on this theme
> are possible; let me lump all these together for now.)

It appears that the essence here is that there is a new datatype,
RDFGRAPH, whose literals are RDF graphs.  It may even be that the
graph-subset relationship is exposed in RDF (to allow partial
specification of the graph).  However, there is no notion that the
elements of this datatype carry RDF semantics.

> So, take Tim Lebo's example from David's recent email:
> 
>> :account_1 {
>>     :entity a prov:Entity
>> }
>> 
>> :account_2 {
>>     :entity a prov:Activity
>> }
>> 
>> prov:Entity owl:disjointWith prov:Activity .
> 
> and presume that we are accepting OWL semantics. 

> Case 1 says: yes, these
> three are OWL-inconsistent taken together. (Of course it allows that you
> might not want to take them together, perhaps even that you should not
> take them together, but as far as what they mean, they are indeed
> mutually inconsistent.)  

Huh?  Case 1 appears to say that this is just an RDF datastructure, with
no semantics, and therefore inconsistency is not an issue.

> Case 2 says: no, these three are
> OWL-consistent, even taken together, because :entity can be a
> prov:Entity in one context and something else in a different context,
> and this is consistent. 

Yes, it does seem to me that this RDF dataset would be consistent under
a treatment of named graphs as (modal) contexts.

> Case 3 also says they are consistent, but for a
> different reason: only the last triple is being asserted; the two named
> graphs don't say *anything* about :entity, only that certain graphs are
> named ":account_1" and ":account_2". (But if these graphs were to have
> their content exposed, eg by importing them using their names into a
> graph containing the third, then there would indeed be a good old 2004
> inconsistency in that graph.)

Yes, under this case, the RDF graphs in named graphs do not participate
in the RDF semantics in the usual way.

> These really need different semantic treatments. 

Well, sure, stated this way there are three different semantic
treatments of RDF graphs.  

> I maintain that the first case does not need any changes to the 2004
> semantics at all, and does not require that datastores be given any
> special semantics. In fact, it is better if they are not, as any
> semantic story beyond the 2004 account of graph meanings will be harmful
> to some appllication or other. Graph names here are purely an organizing
> and record-keeping device, and can be freely used in any way, and
> nothing is changed about RDF by any such use. For example, it would be
> fine to decide that a graph-label association was local to a datastore,
> on this view.

Agreed.

> The third case is closest to the original Bizer et. al. named graph
> proposal, and supports the same kind of graphs-as-resources thinking, in
> which the URI of a graph document is seen as identifying the graph just
> as URIs identify Web pages and the like. Graph labels here have global
> scope, 

Well, graph labels in case 3 are RDF identifiers, i.e., IRIs.  Either
that or the RDF semantics needs a new kind of syntactic thing that looks
just like IRIs (and probably acts just like IRIs - walking and quacking
I'm not so sure about).

> and one can treat a graph label as the name of the graph in a
> very strong sense, use that URi in RDF to refer to the graph (or maybe
> to the graph container, or maybe to either, etc..: again, let me ignore
> this complication for the present.) To assert a datastore is a kind of
> graph baptism: publishing the datatore assigns a global name to the
> graph, and requires that satisfying interpretations respect this
> naming. (The semantic conditions are in the original paper, but in
> essence they are that an interpretation I satisfies
>  label {graph} 
> just when I(label) = graph.  I'm tempted to say, "duh.")

I'm not sure that the "global" is doing here.  Under just about any
reasonable semantic treatment for case 3, graph labels are RDF
identifiers whose denotation is and RDF graph literal (similar quibbles
as in Pat's section just above notwithstanding).  This can be done in
RDF fairly simply, and the simple versions do not appear to have any bad
effects elsewhere.   I'm not sure what is gained here over case 1,
however. 

> The second case is the tricky one, because it has the label actually
> changing the meaning of the triples in the graph. If we are to claim
> that graphs can be true in one context but not in another, then we have
> to change the 2004 semantics somehow in order to provide for this
> context sensitivity. This is where Antoine's approach and mine
> differ. His proposal allows the *meanings of URIs* to change with
> context (as well as being the names of the contexts themselves); mine
> only allows relations to have an extra parameter. The 'context' is then
> this extra parameter which allows truthvalues of triples to appear to
> change, by treating them as quads; but the URIs remain having a global
> meaning.  Antoine's semantics requires adding a context mapping to
> interpretations, so that every URI defines a potentially different
> interpretation context for every other URI; mine requires allowing the
> EXT mapping on RDF properties to admit triples as well as pairs. Neither
> of them change current RDF graph meanings, but they extend this to
> datasets differently. Mine is a semantic extension to RDF, while
> Antoine's is a kind of semantic un-extension: it gives a weaker meaning
> than the RDF semantics does.

If I understand what is going on here, Pat's proposal for case 2 is a
kind of situation calculus whereas Antoine's is a kind of modal logic.
These are both ways of extending RDF.  I would *not* say that Antoine's
is an un-extension of the RDF semantics.  However, it does appear to
allow some very useful things, related to what one might want to call
ridigity of meaning.

For example, consider a modification of Tim Lebo's example:

heaven:god owl:differentFrom hell:devil
us:Republican { wh:pres40 owl:sameAs heaven:god }
us:Democrat { wh:pres40 owl:sameAs hell:devil }

In Pat's case 2 semantics, it appears to me that this would be
inconsistent, whereas in a modal semantics this is often allowed.

[...]

> Pat

peter

Received on Tuesday, 6 March 2012 17:31:23 UTC