Re: three kinds of dataset


I'm trying to reformulate your cases as types of label-graph 
relationships. Sorry if I'm misinterpreting your proposal.

Your case 1 is the one we have if we simply adopt a syntax like TriG 
without further semantics. Let's call this the "LabeledGraph" relationship.

Your case 2 assumes there are some context conditions (the precise 
nature of which we are unlikely to standardize) that need to be taking 
into account when interpreting this graph. Let's call this the 
"ContextualGraph" relationship, which I suggest we can view as a 
specialization of "LabeledGraph".

Case 3 is the "DenotedGraph" relationship, for which we have one special 
case, namely denotation of a static graph container. Pls correct me if 
I'm wrong, but this case looks to me to be orthogonal to case 2: one can 
have a denotation relationship with or without context.

This gives the following type hierarchy for label-graph relationships:


where the two siblings are not disjoint.
[One could add an explicit "DenotedDynamicGraphContainer" as sibling at 
the lowest level, but that seems superfluous].

The reason I'm trying to name these types is that we can use these in a 
design like the one Andy proposed, ie. the labels above could be seen as 
proposals for predefined "typed graph labels" (design 6 [1]).

Note: compared to the other proposals one label-graph relationship seems 
to be absent in your proposal: namely "label owl:sameAs graph", but I 
must admit I never really fully understood when you want to have this 
kind of semantics.



On 06-03-2012 07:55, Pat Hayes wrote:
> Ive been trying to pull all these threads together. Seems to me that the use cases for quads/datasets fall into three main categories, which demand different semantic approaches if we are going to try to avoid interoperability confusion. (Now I understand Antoine's proposal I see how it manages to be a kind of weakest-possible-blanket-case which allows something like all three of these to kind of work, but I would argue that we can do better, because this fit-all approach doesn't really fit anything quite properly. More below.)
> Case 1. Datasets are collections of RDF graphs distinguished from one another by 'labels', used essentially as a bookkeeping device to distinguish one graph from another, to keep entailments from one graph distinguished from those of another, etc..  No actual semantic relationship is assumed to hold between a graph and its label, and each graph is a normal RDF graph to which the 2004 RDF semantics applies. There is no difference in meaning between a labelled graph and the same graph outside the dataset, without the label. No particular meaning is given to the idea of 'asserting' a dataset.
> Case 2. The graph labels in a dataset are presumed to indicate a context of some kind in which the labeled graph is understood to be be true or to hold. To assert the dataset is to assert that each graph holds *in its context* but it may not do so outside the context, so no semantic relationship, eg of entailment, holds between the named graphs in a dataset and any unnamed graphs, even the same graph without its label. (Contexts might include timeperiods, locations, beliefs, sources, "Islands", etc..: anything which is thought of as influencing the truth of something expressed in RDF.)
> Case 3. The graph labels in a dataset are understood to be actual names of the graph they are associated with, ie to formally denote the graph, so that when used in RDF these labels refer to the actual graph. (Or maybe, to some larger graph of which the graph indicated is a part.) The labelled graphs are then essentially being mentioned rather than used, so that the dataset can be asserted without in any way asserting the component named graphs it contains. These named graphs are more like graph literals than a graph in an RDF graph document.  (There is also the idea that the label actually names a graph container whose state is initially the graph shown, and no doubt other variations on this theme are possible; let me lump all these together for now.)
> So, take Tim Lebo's example from David's recent email:
>> :account_1 {
>>      :entity a prov:Entity
>> }
>> :account_2 {
>>      :entity a prov:Activity
>> }
>> prov:Entity owl:disjointWith prov:Activity .
> and presume that we are accepting OWL semantics. Case 1 says: yes, these three are OWL-inconsistent taken together. (Of course it allows that you might not want to take them together, perhaps even that you should not take them together, but as far as what they mean, they are indeed mutually inconsistent.)  Case 2 says: no, these three are OWL-consistent, even taken together, because :entity can be a prov:Entity in one context and something else in a different context, and this is consistent. Case 3 also says they are consistent, but for a different reason: only the last triple is being asserted; the two named graphs don't say *anything* about :entity, only that certain graphs are named ":account_1" and ":account_2". (But if these graphs were to have their content exposed, eg by importing them using their names into a graph containing the third, then there would indeed be a good old 2004 inconsistency in that graph.)
> These really need different semantic treatments.
> I maintain that the first case does not need any changes to the 2004 semantics at all, and does not require that datastores be given any special semantics. In fact, it is better if they are not, as any semantic story beyond the 2004 account of graph meanings will be harmful to some appllication or other. Graph names here are purely an organizing and record-keeping device, and can be freely used in any way, and nothing is changed about RDF by any such use. For example, it would be fine to decide that a graph-label association was local to a datastore, on this view.
> The third case is closest to the original Bizer et. al. named graph proposal, and supports the same kind of graphs-as-resources thinking, in which the URI of a graph document is seen as identifying the graph just as URIs identify Web pages and the like. Graph labels here have global scope, and one can treat a graph label as the name of the graph in a very strong sense, use that URi in RDF to refer to the graph (or maybe to the graph container, or maybe to either, etc..: again, let me ignore this complication for the present.) To assert a datastore is a kind of graph baptism: publishing the datatore assigns a global name to the graph, and requires that satisfying interpretations respect this naming. (The semantic conditions are in the original paper, but in essence they are that an interpretation I satisfies
>   label {graph}
> just when I(label) = graph.  I'm tempted to say, "duh.")
> The second case is the tricky one, because it has the label actually changing the meaning of the triples in the graph. If we are to claim that graphs can be true in one context but not in another, then we have to change the 2004 semantics somehow in order to provide for this context sensitivity. This is where Antoine's approach and mine differ. His proposal allows the *meanings of URIs* to change with context (as well as being the names of the contexts themselves); mine only allows relations to have an extra parameter. The 'context' is then this extra parameter which allows truthvalues of triples to appear to change, by treating them as quads; but the URIs remain having a global meaning.  Antoine's semantics requires adding a context mapping to interpretations, so that every URI defines a potentially different interpretation context for every other URI; mine requires allowing the EXT mapping on RDF properties to admit triples as well as pairs. Neither of them change current RDF grap
h meanings, but they extend this to datasets differently. Mine is a semantic extension to RDF, while Antoine's is a kind of semantic un-extension: it gives a weaker meaning than the RDF semantics does.
> OK, more later. I just wanted to get these distinctions out into the open. My main point is that these are *different*, and to suggest that we should provide ways to distinguish them. One way, for example, might be to give TriG Antoine's semantics (which does not overly interfere with the first case) and to give N-Quads my semantics, and think of (or choose) another syntax for the third case. (BTW, in my earlier email I suggested the use of the + instead of dot to distinguish the 'contextual' case from the plain RDF triple case. This makes sense in my proposal, but AFAIKS not in Antoine's. it allows case 2 to be mixed with case 1 as two kinds of data in a single dataset. Maybe this much flexibility is overkill, however.)
> There are many other issues, like how to distinguish graphs from graph containers; whether we are naming/labeling the graph shown or some other, larger, graph; whether it is good to use RDF to itself describe the pragmatic or semantic alternatives;  and how to combine these various senses if we need to. But I think at least keeping them separate is a useful way to move forward and avoid some of the, er, philosophical disputes.
> Pat
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile

Received on Monday, 12 March 2012 14:17:53 UTC