Re: dataset semantics from Pat Hayes on 2011-12-19 (public-rdf-wg@w3.org from December 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Mon, 19 Dec 2011 04:34:24 -0600
To: Sandro Hawke <sandro@w3.org>
Cc: William Waites <wwaites@tardis.ed.ac.uk>, david@3roundstones.com, public-rdf-wg@w3.org
Message-Id: <4E02931E-E8FC-461E-8A71-E9142BC8769C@ihmc.us>
On Dec 18, 2011, at 10:58 PM, Sandro Hawke wrote:

> On Sat, 2011-12-17 at 10:02 -0600, Pat Hayes wrote:
>> On Dec 17, 2011, at 7:09 AM, Sandro Hawke wrote:
>> 
>>> On Sat, 2011-12-17 at 10:29 +0000, William Waites wrote:
>>>> On Sat, 17 Dec 2011 00:43:38 -0500, Sandro Hawke <sandro@w3.org> said:
>>>> 
>>>>   sandro> We haven't quite figured that out yet.  I'm proposing one
>>>>   sandro> part of that is that a dataset being true implies its
>>>>   sandro> default graph is true.
>> 
>>> In terms of an entailment test:
>>> 
>>>   <a>  { <b> <c> <d> }
>>> 
>>> does NOT entail
>>> 
>>>  { <b> <c> <d> }
>>> 
>> 
>> 
>> Really?? Is this generally accepted, or is it your own conclusion?
> 
> This is just my strawman proposal, to try to get us moving along.
> 
>> Because this has the (to me surprising) consequence that publishing a dataset does not assert ANY of the named graphs in it. Which leaves me wondering what the point of having datasets can possibly be in the first place. Does the Semantic Web consist mostly of unasserted fiction? 
> 
> Well, my earlier entailment test was about how the default graph *is*
> asserted.  But, right, I'm suggesting that the others are not
> automatically asserted.
> 
> Your surprise here makes me think there are two *very* different kinds
> of use cases for transmitted datasets.
> 
> 1.  Someone wants to publish some triples, but they want to annotate
> various subgraphs among the triples.  I think that's the case you're
> thinking of.  So all the triples are asserted, but also, they can say
> things about subgraphs.

Well, you can say things about the subgraphs provided that the annotating names really are names, ie if they denote the subgraph. If they are just 'associated' with the subgraph but denote something else, then when you when you use them in some RDF, they mean that other thing, not the graph. 

But OK, I take your point. If this really is a common situation, then I think we should seriously consider extending the basic RDF model to quads rather than triples. A system consisting of a single RDF graph, with bnodes, but with subgraphs identified by URIs, is a genuine extension of the current RDF model. It is not a dataset as currently defined. 

> 
> 2.  Someone wants to say things about graphs/subgraphs that other people
> have asserted, without necessarily buying into them.  That's the case I
> was thinking of, where people can talk about, say, statements made in
> the past, or statements of unknown veracity.   It wouldn't work for them
> to have to assert those graphs just to say they are no longer true.

Right. It seems to me that this is exactly what named graphs were intended for. Once you give a graph a name, you can simply refer to it, using that name, and say anything you like about it, without thereby also asserting it. You can say it is false if you like. But this does not require anything like a dataset: it only requires the ability to associate a name with a graph. And since the Web has a pretty robust technique for attaching names to things, called URIs, why don't we just use that? Especially as the entire idea has been worked out in full detail. 

> 
> So, how can we support both kinds?

You are presuming the this really is two kinds of dataset. BUt I think neither of these is really a dataset. The second is just named graphs; the first is something more intricate than a dataset, one that takes full advantage of the extended expressivity of the quad model. 

> 
> - we could have some kind of flag outside the document, like a different
> media-type for each semantics
> 
> - we could have some kind of global flag inside the document, like a
> triple like { <> a rdf:Type_A_Dataset }
> 
> - we could have some kind of flag on each labeled subgraph, saying
> whether it is also asserted; maybe a keyword, like INCLUDED or
> EXCLUDED.
> 
> - in the default graph, we could explicitly assert some of the subgraphs
> (which would require being somehow able to refer to them).
> 
> At this point, the second of these strikes me as the least painful, with
> possibly the third overriding it, on a per-subgraph basis.

All of these sound to me like weird hacks to make a badly designed tool serve a purpose for which a properly designed tool already exists. 

> 
> BTW, I keep saying "subgraph", because several people have said they
> need bnodes to be shared between these things, and as I understand the
> RDF semantics, that means they shouldn't be considered graphs in their
> own right, just subgraphs of some larger (not-necessarily-asserted)
> graph.

Yes, if they share bnodes then they are part of a single graph. They are graphs in their own right, of course, but the larger graph must exist. 

Pat


> 
>    -- Sandro
> 
> 
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Monday, 19 December 2011 10:35:07 UTC