Re: three kinds of dataset

A few quick clarifications.

On Mar 6, 2012, at 11:30 AM, Peter F. Patel-Schneider wrote:

> From: Pat Hayes <phayes@ihmc.us>
> Subject: three kinds of dataset
> Date: Tue, 6 Mar 2012 00:55:58 -0600
> 
>> Ive been trying to pull all these threads together. Seems to me that the
>> use cases for quads/datasets fall into three main categories, which
>> demand different semantic approaches if we are going to try to avoid
>> interoperability confusion. (Now I understand Antoine's proposal I see
>> how it manages to be a kind of weakest-possible-blanket-case which
>> allows something like all three of these to kind of work, but I would
>> argue that we can do better, because this fit-all approach doesn't
>> really fit anything quite properly. More below.)
>> 
>> Case 1. Datasets are collections of RDF graphs distinguished from one
>> another by 'labels', used essentially as a bookkeeping device to
>> distinguish one graph from another, to keep entailments from one graph
>> distinguished from those of another, etc..  No actual semantic
>> relationship is assumed to hold between a graph and its label, and each
>> graph is a normal RDF graph to which the 2004 RDF semantics
>> applies. There is no difference in meaning between a labelled graph and
>> the same graph outside the dataset, without the label. No particular
>> meaning is given to the idea of 'asserting' a dataset.
> 
> This case appears to not have any semantic implications.  Graphs are
> still graphs, entailment is still between graphs, etc., etc.  There are
> no semantics for RDF datasets, which are just some data structure to be
> used by applications as they see fit.

Yes, although the idea was that the RDF graphs in these datastructures are indeed RDF graphs with RDF semantics. 
> 
>> Case 2. The graph labels in a dataset are presumed to indicate a context
>> of some kind in which the labeled graph is understood to be be true or
>> to hold. To assert the dataset is to assert that each graph holds *in
>> its context* but it may not do so outside the context, so no semantic
>> relationship, eg of entailment, holds between the named graphs in a
>> dataset and any unnamed graphs, even the same graph without its
>> label. (Contexts might include timeperiods, locations, beliefs, sources,
>> "Islands", etc..: anything which is thought of as influencing the truth
>> of something expressed in RDF.)
> 
> Here there is some notion of asserting a dataset.  What, then is the
> theory involved?

Well, there are several options for this theory. Right now I am just trying to get the intuitions clear. 

>  Presumably the default graph trples are just asserted,
> but what happens for named graphs?  The RDF semantics would have to be
> expanded to include RDF graphs.

datasets. you mean? Yes indeed. 

>  What consequences do we want in this
> semantics?

We have a whole lot of examples suggesting various views on this.

>  How is this semantics going to interact with the various
> RDFS or OWL constructs?

That last is a very good question, which we havnt really taken up with any degree of seriousness yet. My view is that whatever we do, we have to leave current RDF graphs having their current RDF meaning for compatibility with OWL and RDFS, and we can leave it to OWL to decide if it wants to extend itelf to whatever new expressiveness we provide using datastores. We might want to add a few things to RDFS though, if we decide to go in this direction. 

> 
>> Case 3. The graph labels in a dataset are understood to be actual names
>> of the graph they are associated with, ie to formally denote the graph,
>> so that when used in RDF these labels refer to the actual graph. (Or
>> maybe, to some larger graph of which the graph indicated is a part.) The
>> labelled graphs are then essentially being mentioned rather than used,
>> so that the dataset can be asserted without in any way asserting the
>> component named graphs it contains. These named graphs are more like
>> graph literals than a graph in an RDF graph document.  (There is also
>> the idea that the label actually names a graph container whose state is
>> initially the graph shown, and no doubt other variations on this theme
>> are possible; let me lump all these together for now.)
> 
> It appears that the essence here is that there is a new datatype,
> RDFGRAPH, whose literals are RDF graphs.

Well, that is one way to do it. I like the proposal in Bizer et al. better, myself. 

>  It may even be that the
> graph-subset relationship is exposed in RDF (to allow partial
> specification of the graph).  However, there is no notion that the
> elements of this datatype carry RDF semantics.

No, that is exactly what is proposed. The RDF graphs being named here are real RDF graphs with RDF semantics. But the dataset names them rather than asserts them. 

> 
>> So, take Tim Lebo's example from David's recent email:
>> 
>>> :account_1 {
>>>    :entity a prov:Entity
>>> }
>>> 
>>> :account_2 {
>>>    :entity a prov:Activity
>>> }
>>> 
>>> prov:Entity owl:disjointWith prov:Activity .
>> 
>> and presume that we are accepting OWL semantics. 
> 
>> Case 1 says: yes, these
>> three are OWL-inconsistent taken together. (Of course it allows that you
>> might not want to take them together, perhaps even that you should not
>> take them together, but as far as what they mean, they are indeed
>> mutually inconsistent.)  
> 
> Huh?  Case 1 appears to say that this is just an RDF datastructure, with
> no semantics, and therefore inconsistency is not an issue.

Why would an RDF datastructure have no semantics? The RDF semantics is normative for RDF. Maybe consistency isnt something anyone is worried about, for pragmatic reasons of some kind, but that does not alter the fact of the presence of the inconsistency in the data. 

> 
>> Case 2 says: no, these three are
>> OWL-consistent, even taken together, because :entity can be a
>> prov:Entity in one context and something else in a different context,
>> and this is consistent. 
> 
> Yes, it does seem to me that this RDF dataset would be consistent under
> a treatment of named graphs as (modal) contexts.
> 
>> Case 3 also says they are consistent, but for a
>> different reason: only the last triple is being asserted; the two named
>> graphs don't say *anything* about :entity, only that certain graphs are
>> named ":account_1" and ":account_2". (But if these graphs were to have
>> their content exposed, eg by importing them using their names into a
>> graph containing the third, then there would indeed be a good old 2004
>> inconsistency in that graph.)
> 
> Yes, under this case, the RDF graphs in named graphs do not participate
> in the RDF semantics in the usual way.

Well... i dont like that way of putting it. They are in effect being quoted here rather than used, but (as I treid to emphasise) if you do manage to use them, they have their RDF semantic weight just as usual.

> 
>> These really need different semantic treatments. 
> 
> Well, sure, stated this way there are three different semantic
> treatments of RDF graphs.  

I think that both cases 1 and 3 essentially treat RDF graphs in the same way as the 2004 specs do. Case 2 is a genuine change to the meaning of triples, and my point was to bring this out into the open, so to speak, because users of quad stores have been using them in this way, and want to go on using them in this way. 

> 
>> I maintain that the first case does not need any changes to the 2004
>> semantics at all, and does not require that datastores be given any
>> special semantics. In fact, it is better if they are not, as any
>> semantic story beyond the 2004 account of graph meanings will be harmful
>> to some appllication or other. Graph names here are purely an organizing
>> and record-keeping device, and can be freely used in any way, and
>> nothing is changed about RDF by any such use. For example, it would be
>> fine to decide that a graph-label association was local to a datastore,
>> on this view.
> 
> Agreed.
> 
>> The third case is closest to the original Bizer et. al. named graph
>> proposal, and supports the same kind of graphs-as-resources thinking, in
>> which the URI of a graph document is seen as identifying the graph just
>> as URIs identify Web pages and the like. Graph labels here have global
>> scope, 
> 
> Well, graph labels in case 3 are RDF identifiers, i.e., IRIs.  Either
> that or the RDF semantics needs a new kind of syntactic thing that looks
> just like IRIs (and probably acts just like IRIs - walking and quacking
> I'm not so sure about).

Exactly, it might not quack in quite the same way, c.f. recent emails pointing out that datastore implementations often side-step the http machinery and use their own labelling to assocaite a URI with a graph. Should we just smile at this as an implementation of a kind of caching, a mere implementation detail beneath our standardizational gaze, or should we deplore it or endorse it? 

> 
>> and one can treat a graph label as the name of the graph in a
>> very strong sense, use that URi in RDF to refer to the graph (or maybe
>> to the graph container, or maybe to either, etc..: again, let me ignore
>> this complication for the present.) To assert a datastore is a kind of
>> graph baptism: publishing the datatore assigns a global name to the
>> graph, and requires that satisfying interpretations respect this
>> naming. (The semantic conditions are in the original paper, but in
>> essence they are that an interpretation I satisfies
>> label {graph} 
>> just when I(label) = graph.  I'm tempted to say, "duh.")
> 
> I'm not sure that the "global" is doing here.  Under just about any
> reasonable semantic treatment for case 3, graph labels are RDF
> identifiers whose denotation is and RDF graph literal (similar quibbles
> as in Pat's section just above notwithstanding).

But there are some unreasonable treatments being talked about, eg "local" interpretations of graph labels that sidestep their role as RDF identifiers (while still allowing them be RDF identifiers in other places), which AFAIKS amounts to a kind of punning. So that "global" was intended to explicitly sideline those ideas. 

>  This can be done in
> RDF fairly simply, and the simple versions do not appear to have any bad
> effects elsewhere.   I'm not sure what is gained here over case 1,
> however. 

The ability to use the IRI graph labels inside RDF triples to refer to the graph, effectively using RDF as metadata. 

> 
>> The second case is the tricky one, because it has the label actually
>> changing the meaning of the triples in the graph. If we are to claim
>> that graphs can be true in one context but not in another, then we have
>> to change the 2004 semantics somehow in order to provide for this
>> context sensitivity. This is where Antoine's approach and mine
>> differ. His proposal allows the *meanings of URIs* to change with
>> context (as well as being the names of the contexts themselves); mine
>> only allows relations to have an extra parameter. The 'context' is then
>> this extra parameter which allows truthvalues of triples to appear to
>> change, by treating them as quads; but the URIs remain having a global
>> meaning.  Antoine's semantics requires adding a context mapping to
>> interpretations, so that every URI defines a potentially different
>> interpretation context for every other URI; mine requires allowing the
>> EXT mapping on RDF properties to admit triples as well as pairs. Neither
>> of them change current RDF graph meanings, but they extend this to
>> datasets differently. Mine is a semantic extension to RDF, while
>> Antoine's is a kind of semantic un-extension: it gives a weaker meaning
>> than the RDF semantics does.
> 
> If I understand what is going on here, Pat's proposal for case 2 is a
> kind of situation calculus whereas Antoine's is a kind of modal logic.

Nicely put. Actually I think Antoine's is a very simple hybrid logic which has labels for the possible worlds, rather than a modality. But whatever. 

> These are both ways of extending RDF.  I would *not* say that Antoine's
> is an un-extension of the RDF semantics.  

My point was just that under Antoine's construction, a dataset has a *weaker* meaning than it would if we applied the straighforward 2004 semantics to all its graphs. 

> However, it does appear to
> allow some very useful things, related to what one might want to call
> ridigity of meaning.
> 
> For example, consider a modification of Tim Lebo's example:
> 
> heaven:god owl:differentFrom hell:devil
> us:Republican { wh:pres40 owl:sameAs heaven:god }
> us:Democrat { wh:pres40 owl:sameAs hell:devil }
> 
> In Pat's case 2 semantics, it appears to me that this would be
> inconsistent, whereas in a modal semantics this is often allowed.

It depends on whether we could assert owl:sameAs in a context-sensitive way. The issue arises in both semantics. Equality and modality is always a dangerous combination. For Antoine, the question would be, can owl:sameAs be reinterpreted when we change an RDF interpretation? As I understand the idea, the answer would be no, since to get OWL entailments you have to specify a change of OWL interpretation by context, but not a change from OWL to something non-OWL. For me, the question is, can owl:sameAs be given an extra 'context' argument? And again, my own suggestion would be no, and require a different vocabulary for any kind of contextual-identity relation, but that issue would have to be thrashed out at the OWL level. 

There is no doubt that allowing truth in RDF to be contextualized, however we do it, raises all kinds of complicated possibilities which we will not have the time or the energy to settle in this WG. But my hope is that we can at least provide something that allows for the commonest uses to be handled uniformly, most useful among them being the ability to record time-sensitive information, which we might be able to usefully relate to the whole graphs/graph container issue. And, I guess, I want to allow case 1 without having to agree to relativise or contextualize or modalize all the graphs in the dataset: I want that to be an option, and to have some bite for that option when it is chosen. 

Pat

> 
> [...]
> 
>> Pat
> 
> peter
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

Received on Tuesday, 6 March 2012 18:30:47 UTC