Re: dataset semantics from Pat Hayes on 2011-12-17 (public-rdf-wg@w3.org from December 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Sat, 17 Dec 2011 09:58:27 -0600
To: Sandro Hawke <sandro@w3.org>
Cc: David Wood <david@3roundstones.com>, RDF WG <public-rdf-wg@w3.org>
Message-Id: <136CBF28-96F7-4F92-B6E8-39E7490D06D9@ihmc.us>
On Dec 16, 2011, at 11:43 PM, Sandro Hawke wrote:

> On Fri, 2011-12-16 at 22:47 -0600, Pat Hayes wrote:
>> On Dec 16, 2011, at 10:21 PM, Sandro Hawke wrote:
>> 
>>> ... maybe I can figure out some TriG
>>> entailment tests....    Like, does this TriG document / dataset:
>>> 
>>>       { <a> <b> <c> }
>>> 
>>> entail this RDF graph:
>>> 
>>>   <a> <b> <c>.
>>> 
>>> I think it should, so we can have metadata in TriG, but other people
>>> have disagreed.   How should we be gather test cases like this?
>> 
>> 
>> FWIW, 'entailment' has a fairly precise meaning. A entails B when B is true whenever A is, or more precisely if, for every possible interpretation I, if A is true in I then B is true in I. So it only makes sense to speak of entailment when there is some notion of truth-in-an-interpretation to base it on. 
> 
> Yes, I know.

OK :-)
> 
>> So, what are the truth conditions for datasets? 
> 
> We haven't quite figured that out yet.   I'm proposing one part of that
> is that a dataset being true implies its default graph is true.

Why just the default graph? Aren't queries also directed against the other graphs? Seems to me that the only thing that marks the default graph as being special is that it has no name, which has nothing to do with its truth or falsity.

BTW, what was the rationale for having a nameless graph in a dataset in the first place? Seems to me that the SPARQL design would be improved if all graphs were required to have some kind of name, and the query was obliged to use the name. After all, this is how the rest of the Web works. 

> 
> The other part of the truth conditions has to do with the relationship
> between the things named by the label URIs and the graphs they label.   
> 
> Unfortunately, I think we need to allow for several possible
> relationships there, MAYBE even in the same dataset, which makes things
> rather complicated.

Blech. Why do we NEED to do this? 
> 
> One example of the relationship is what I called graphState in a
> different thread.  In that case, the dataset being true would imply that
> for each <U,G> in the dataset, the state of the resource U is the graph
> G.   (Here, I mean "state" and "resource" in exactly the REST sense.)

And that this graph is true? Ie, is the graph itself asserted when the dataset is asserted? 

> Another example is an out of date version of graphState, maybe call it
> graphStateWas.  In this case, the dataset being true would imply that
> for each <U,G> in the dataset, the state of the resource U is, or used
> to be, graph G.

Why would we need this? Surely when something is changed, it is no longer asserting what it did before the change. That is kind of the point of allowing change, seems to me.

> 
> Another example of the relationship is something I gather Cambridge
> Semantics uses, which I'll call subjectOf.   (In one of their deployment
> modes, triples are divided into two type, which I'll call A and B, based
> on which predicate they use.  The dataset is constructed such that for
> each <U, G> in the dataset, every type-A triple in G is of the form
> { <U> ?P ?O }.  The type-B triples are a little more complicated.)  In
> this case, the dataset being true would imply the dataset being
> segmented in this complicated but useful way.   

With all respect to Cambridge Semantics, if they are the only user of this odd convention, then I really dont think we as a WG should even be considering standardizing it. Unless someone can make a case for why it is going to be generally useful.

And in any case, this sounds like a syntactic restriction rather than a semantic condition. Having the dataset be segmented is not going to alter the interpretations of any of the triples (is it?). So the semantics (and hence the entailments) can ignore this.

> 
> It's *rather* tempting to just use triples for this, making graphState,
> graphStateWas, subjectOf, etc, be predicates.   That way the semantics
> of datasets would be much simpler, with the complications bundled into
> the semantics of those particular predicates. 
> 
> I'm guess I'm suggesting extending the definition of dataset to be a
> default graph and rather than a set of pairs <U,G>, be a set of triples
> <U, R, G>, where R is optional.  If R is omitted, you have the kind of
> dataset we're used to now, where we have no idea what that relation is
> supposed to be (unless the author tells us humans).

So I should interpret <U, R, G> to mean that the relation R holds between the resource U and the graph G, and U is *never* simply a name of the graph, is that right? That is we never have the graph  simply being the resource identified by the IRI ?

> 
>> Can one assert a dataset (ie claim it to be true)? 
> 
> Yes.
> 
>> How does one do that? 
> 
> The same way you do with RDF.  It kind of depends on your application.
> Maybe you publish it on the web; maybe you send it to some agent; maybe
> you publish it and send the URL somewhere, etc.

And is this in fact done? Do people transmit SPARQL datasets around the Web? What would be a typical transaction involving a dataset? When it is done, what typically happens to the RDF triples in the graphs in the dataset? Do other applications extract them and mash them up with other RDF? Or are they always kept in their dataset 'context'? 

Pat


> 
>   -- Sandro
> 
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Saturday, 17 December 2011 15:59:13 UTC