Re: three kinds of dataset from Pat Hayes on 2012-03-12 (public-rdf-wg@w3.org from March 2012)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 12 Mar 2012 14:53:40 -0500
To: Guus Schreiber <guus.schreiber@vu.nl>
Cc: RDF-WG WG <public-rdf-wg@w3.org>
Message-Id: <007F38BD-D9CC-46C3-B934-31E379CCE515@ihmc.us>
On Mar 12, 2012, at 9:17 AM, Guus Schreiber wrote:

> Pat,
> 
> I'm trying to reformulate your cases as types of label-graph relationships. Sorry if I'm misinterpreting your proposal.

I forgive you in advance :-)

> 
> Your case 1 is the one we have if we simply adopt a syntax like TriG without further semantics. Let's call this the "LabeledGraph" relationship.

Right.

> 
> Your case 2 assumes there are some context conditions (the precise nature of which we are unlikely to standardize) that need to be taking into account when interpreting this graph. Let's call this the "ContextualGraph" relationship,

OK, but...

> which I suggest we can view as a specialization of "LabeledGraph".

... ? I don't see this, since the contextual case changes the semantics and the labeled one doesn't. 

> 
> Case 3 is the "DenotedGraph" relationship, for which we have one special case, namely denotation of a static graph container.

Well, actually that case was one I was putting off for now, until we get the kinds-of-structures-that-are-in-containers sorted out. A quad store is a candidiate for a ContextualGraph container. A ContextualGraph won't fit into an ordinary graph container. 

> Pls correct me if I'm wrong, but this case looks to me to be orthogonal to case 2: one can have a denotation relationship with or without context.

Um. Not sure I follow you. (I guess we could have a contextual denotation of a graph, but I wasnt suggesting that idea and I don't think (?) anyone has proposed it, so I'd rather not even be talking about it. Although it would happen kind of automatically if we had URIs acting as genuine graph names, as in case 3, and then used those names in another graph that happened itself ot be Contextual, ie case 2.  Hmmm.)

Case 3 was supposed to be like the original Bizer et. al. proposal for named graphs, in which the name URI simply denotes/refers-to/identifies the named graph, end of story. No need to mention quads or SPARQL or any of this other stuff for case 3. 

> 
> This gives the following type hierarchy for label-graph relationships:
> 
>  LabelledGraph
>    ContextualGraph
>    DenotedGraphContainer
>      DenotedStaticGraphContainer
> 
> where the two siblings are not disjoint.

I dont see how this hierarchy works, I have to admit. It isnt a class hierarchy.

Also I guess Im not very happy with this sounding like a classification of kinds of graph. Thats not the point: the very same graph could have a label in a case-1 structure and be given a context in another and be case-3 named in yet another, and its the same graph in all three cases.

> [One could add an explicit "DenotedDynamicGraphContainer" as sibling at the lowest level, but that seems superfluous].
> 
> The reason I'm trying to name these types is that we can use these in a design like the one Andy proposed, ie. the labels above could be seen as proposals for predefined "typed graph labels" (design 6 [1]).

I think it works in Andys case for distinguishing graphs from containers, but not so well here. See above for the idea of using rdf:type, but something like this could be done, if we are OK with having it done on a graph-by-graph basis rather than a triple-by-triple basis. But I am worried that this might be too coarse-grained for many applications, eg wanting to 'mark' some part of a namespace as non-contextual even though it gets used in contextual graphs. If we dont allow this, there will be lots of trouble down the road. 

> 
> Note: compared to the other proposals one label-graph relationship seems to be absent in your proposal: namely "label owl:sameAs graph", but I must admit I never really fully understood when you want to have this kind of semantics.

Me neither, but I think the idea is that the URI denotes a thing which itself names the graph. So the graph name is a kind of information resource itself (?) 

Pat


> 
> Guus
> 
> [1] http://www.w3.org/2011/rdf-wg/wiki/TF-Graphs-Designs#Typed_Labels
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On 06-03-2012 07:55, Pat Hayes wrote:
>> Ive been trying to pull all these threads together. Seems to me that the use cases for quads/datasets fall into three main categories, which demand different semantic approaches if we are going to try to avoid interoperability confusion. (Now I understand Antoine's proposal I see how it manages to be a kind of weakest-possible-blanket-case which allows something like all three of these to kind of work, but I would argue that we can do better, because this fit-all approach doesn't really fit anything quite properly. More below.)
>> 
>> Case 1. Datasets are collections of RDF graphs distinguished from one another by 'labels', used essentially as a bookkeeping device to distinguish one graph from another, to keep entailments from one graph distinguished from those of another, etc..  No actual semantic relationship is assumed to hold between a graph and its label, and each graph is a normal RDF graph to which the 2004 RDF semantics applies. There is no difference in meaning between a labelled graph and the same graph outside the dataset, without the label. No particular meaning is given to the idea of 'asserting' a dataset.
>> 
>> Case 2. The graph labels in a dataset are presumed to indicate a context of some kind in which the labeled graph is understood to be be true or to hold. To assert the dataset is to assert that each graph holds *in its context* but it may not do so outside the context, so no semantic relationship, eg of entailment, holds between the named graphs in a dataset and any unnamed graphs, even the same graph without its label. (Contexts might include timeperiods, locations, beliefs, sources, "Islands", etc..: anything which is thought of as influencing the truth of something expressed in RDF.)
>> 
>> Case 3. The graph labels in a dataset are understood to be actual names of the graph they are associated with, ie to formally denote the graph, so that when used in RDF these labels refer to the actual graph. (Or maybe, to some larger graph of which the graph indicated is a part.) The labelled graphs are then essentially being mentioned rather than used, so that the dataset can be asserted without in any way asserting the component named graphs it contains. These named graphs are more like graph literals than a graph in an RDF graph document.  (There is also the idea that the label actually names a graph container whose state is initially the graph shown, and no doubt other variations on this theme are possible; let me lump all these together for now.)
>> 
>> So, take Tim Lebo's example from David's recent email:
>> 
>>> :account_1 {
>>>     :entity a prov:Entity
>>> }
>>> 
>>> :account_2 {
>>>     :entity a prov:Activity
>>> }
>>> 
>>> prov:Entity owl:disjointWith prov:Activity .
>> 
>> and presume that we are accepting OWL semantics. Case 1 says: yes, these three are OWL-inconsistent taken together. (Of course it allows that you might not want to take them together, perhaps even that you should not take them together, but as far as what they mean, they are indeed mutually inconsistent.)  Case 2 says: no, these three are OWL-consistent, even taken together, because :entity can be a prov:Entity in one context and something else in a different context, and this is consistent. Case 3 also says they are consistent, but for a different reason: only the last triple is being asserted; the two named graphs don't say *anything* about :entity, only that certain graphs are named ":account_1" and ":account_2". (But if these graphs were to have their content exposed, eg by importing them using their names into a graph containing the third, then there would indeed be a good old 2004 inconsistency in that graph.)
>> 
>> These really need different semantic treatments.
>> 
>> I maintain that the first case does not need any changes to the 2004 semantics at all, and does not require that datastores be given any special semantics. In fact, it is better if they are not, as any semantic story beyond the 2004 account of graph meanings will be harmful to some appllication or other. Graph names here are purely an organizing and record-keeping device, and can be freely used in any way, and nothing is changed about RDF by any such use. For example, it would be fine to decide that a graph-label association was local to a datastore, on this view.
>> 
>> The third case is closest to the original Bizer et. al. named graph proposal, and supports the same kind of graphs-as-resources thinking, in which the URI of a graph document is seen as identifying the graph just as URIs identify Web pages and the like. Graph labels here have global scope, and one can treat a graph label as the name of the graph in a very strong sense, use that URi in RDF to refer to the graph (or maybe to the graph container, or maybe to either, etc..: again, let me ignore this complication for the present.) To assert a datastore is a kind of graph baptism: publishing the datatore assigns a global name to the graph, and requires that satisfying interpretations respect this naming. (The semantic conditions are in the original paper, but in essence they are that an interpretation I satisfies
>>  label {graph}
>> just when I(label) = graph.  I'm tempted to say, "duh.")
>> 
>> The second case is the tricky one, because it has the label actually changing the meaning of the triples in the graph. If we are to claim that graphs can be true in one context but not in another, then we have to change the 2004 semantics somehow in order to provide for this context sensitivity. This is where Antoine's approach and mine differ. His proposal allows the *meanings of URIs* to change with context (as well as being the names of the contexts themselves); mine only allows relations to have an extra parameter. The 'context' is then this extra parameter which allows truthvalues of triples to appear to change, by treating them as quads; but the URIs remain having a global meaning.  Antoine's semantics requires adding a context mapping to interpretations, so that every URI defines a potentially different interpretation context for every other URI; mine requires allowing the EXT mapping on RDF properties to admit triples as well as pairs. Neither of them change current RDF grap
> h meanings, but they extend this to datasets differently. Mine is a semantic extension to RDF, while Antoine's is a kind of semantic un-extension: it gives a weaker meaning than the RDF semantics does.
>> 
>> OK, more later. I just wanted to get these distinctions out into the open. My main point is that these are *different*, and to suggest that we should provide ways to distinguish them. One way, for example, might be to give TriG Antoine's semantics (which does not overly interfere with the first case) and to give N-Quads my semantics, and think of (or choose) another syntax for the third case. (BTW, in my earlier email I suggested the use of the + instead of dot to distinguish the 'contextual' case from the plain RDF triple case. This makes sense in my proposal, but AFAIKS not in Antoine's. it allows case 2 to be mixed with case 1 as two kinds of data in a single dataset. Maybe this much flexibility is overkill, however.)
>> 
>> There are many other issues, like how to distinguish graphs from graph containers; whether we are naming/labeling the graph shown or some other, larger, graph; whether it is good to use RDF to itself describe the pragmatic or semantic alternatives;  and how to combine these various senses if we need to. But I think at least keeping them separate is a useful way to move forward and avoid some of the, er, philosophical disputes.
>> 
>> Pat
>> 
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 12 March 2012 19:54:18 UTC