dataset semantics being connected to the state of the web from Sandro Hawke on 2012-06-08 (public-rdf-wg@w3.org from June 2012)

From: Sandro Hawke <sandro@w3.org>
Date: Fri, 08 Jun 2012 08:28:22 -0400
To: Richard Cyganiak <richard@cyganiak.de>
Cc: public-rdf-wg@w3.org
Message-ID: <1339158502.18605.334.camel@waldron>
On Fri, 2012-06-08 at 10:51 +0200, Richard Cyganiak wrote:
> Hi Sandro,
> 
> >> I've heard you say two mutually incompatible things:
> >> 
> >> 1. A Turtle file published at <i> containing graph G is an RDF dataset with only named graph <i,G>
> >> 
> >> 2. A Turtle file published at <i> containing graph G is an RDF dataset with only a default graph
> >> 
> >> Which one is it? It can't be both.
> > 
> > If I said (1), it was a mistake.
> > 
> > I would rephrase (1) as a conditional:
> > 
> >   A.  If it is true that a turtle file serializing G is what is
> > published at <i>,
> >   B.  Then the dataset consisting of the named graph <i,G> is true.
> 
> -1. 
> 
> We can postulate the existence of a *specific* dataset, let's call it
> the “web dataset”, and can say that under the condition above the
> g-pair <i,G> is true in the web dataset. 

Yes.     I'm not sure that's the most useful framing, but it's quite
reasonable.

> (Formally, this could be done
> as a semantic extension, let's call it W-entailment (for web). So if A
> is true then *every* dataset W-entails the g-pair <i,G>.)

The logicians can correct me, but that seems to me like a non-standard
way to use entailment.  Whether one statement entails another is
something that can be determined purely by looking at the two statements
and understanding the logic of the language they are written in.
Entailment isn't about what statements happen to be true of the domain
of discourse.

> But I will formally object to anything that defines truth *in general*
> in terms of dereferencing. This is not a negotiable position.

Consensus is not about negotiation, it's about shared understanding.

I don't understand your position.   

In particular, I think I can understand two possible position:

1.  You might be saying that logically it makes no sense to have the
truth of a dataset have any connection whatsoever to the state of the
Web.   If you said that, I'm pretty sure you'd be wrong.  I'm confident
we could define things this way such that people could generally
understand it and build working interoperable systems.

2.  You might be saying that if we define dataset semantics like this,
we'll break valuable deployed systems or make it impossible to build
important new systems.   I don't think this is the case, but there could
easily be something I don't know about in this space, so please just
give me an example. 

Or maybe you're thinking of something else entirely.

The rest of the email seems separate enough that I'll answer it in a
different thread, probably later.

     -- Sandro

> > Statement (2) is close to correct, but I'd change it slightly; it's not
> > that it "is" a dataset, but that it can reasonably be read as a dataset.
> > It's a type-conversion thing.  A triple can be seen as a (trivial)
> > graph; a character can be seen as a (trivial) string; a graph can be
> > seen as a (trivial) dataset.    
> 
> This is sloppy thinking. They are not the same, and by pretending that they are, you are just confusing matters.
> 
> > In practice, I see this manifesting in the kinds of APIs one uses for
> > loading and manipulating dataset.  Can give the API a graph when it is
> > expecting a dataset and have it silently promote the graph to being a
> > dataset with that graph as its default graph?  
> 
> The much more interesting case is the opposite situation: What happens when you give a dataset to an API that expects a graph? That's after all the status quo; anyone who goes to the web to load a Turtle file expects a graph, and that's how it's been implemented for the last eight years. If we now define that a Turtle parser must also be able to handle datasets, we've deeply broken every existing implementation.
> 
> > Alternative, we could define a class of things that is the union of the
> > class of graphs and the class of datasets -- that would be more crisp
> > and might be as convenient.    But I expect people will be find just
> > using datasets as those things.
> 
> I don't see the point of this. If we define truth for datasets, and consider the default graph as asserted, then an RDF graph is semantically equivalent to an RDF dataset with just a default graph and no empty graphs. That's all we need. But they are not the same.
> 
> Best,
> Richard
> 
> 
> 
> > 
> > To be clear: this is speculative.   My point is not to say we should
> > standardize this, but I don't think we should rule it out.
> > 
> >   -- Sandro
> > 
> >> Best,
> >> Richard
> > 
> > 
> > 
> 
>
Received on Friday, 8 June 2012 12:28:37 UTC