Re: Bound and Unbound Datasets

On Jun 3, 2013, at 10:34 AM, Sandro Hawke wrote:

> It looks to me like we have two very different camps concerning datasets.    ISSUE-131 has brought this to light again, but the camps long predate that issue.  The division is between the people who have been using datasets with application-dependent semantics for a long time and the people who want to build things which require standard interoperable semantics for datasets.    I'm in a latter camp, and was arguing for it for a long time, but I decided some months ago I could live without standard semantics via a very convoluted mechanism.  I agreed to document that mechanism, but as I have contemplated doing so, I've been dragging my feet because it's pretty weird and I think the group wont like it.    (Talking off-list to Pat about it yesterday, I think it's safe to say he hated it.)

You betcha.

> So I have an alternative proposal.  Let's have two kinds of datasets:
> 
> * "Unbound" datasets are what's been in SPARQL and rdf-concepts so far.   According to the standard they are just structure, with no semantics.  In practice, their semantics are determined by the application in which they are used.
> 
> * "Bound" datasets have the following semantics:
>      (1) for the dataset to be true, the default graph must be true;

But with a slight tweak, see below. 

>      (2) graph names denote the graphs they are paired with.
> 
> I suggest we indicate a dataset is bound by putting the magic triple { <> a rdf:BoundDataset } in its default graph.   (This triple would be treated specially in the RDF semantics for any system which implements/recognizes bound datasets; to other systems (eg SPARQL) it's just another triple.)

We could treat this in the following "context" way, that IRIs (and bnodes?) which occur as graph labels in the dataset are interpreted **in the default graph** as denoting the graphs they label. That is, *just* in the default graph. That then allows the use of IRIs which 'globally' denote something else to still be used as graph labels, without breaking the semantics. That would allow a (very limited and special-purpose) kind of punning to be used in default graphs for the purposes of graph identification. I think this would be the most useful way to handle this. 

Technically, the semantics of a bound dataset is: define the binding map B to be the function from IRIs (and bnodes?) used as graph labels to the graphs they label. Then the dataset is true in I just when the default graph D is true in I/B, ie the interpretation which is just like I except it maps IRIs (and bnodes?) in the domain of B to their B value. 

>  If a dataset does not have this flag, it's unbound.   Of course, being unbound, it has application-specific semantics and so an application may choose to treat it as bound. 
> 
> I think this would solve a lot of problems, and not raise too many.

I agree.  It does mean, c.f. Gregg's comment, that merging datasets requires paying attention to this flag and treating it seriously, as its presence/absence can change IRI denotations. But I dont see any way to have complete freedom to merge datasets in any case if they might be using application-dependent semantics. 

>     I expect many of the folks who wanted us to standardize named graphs, fix reification, etc, when this group was chartered, would much prefer having this option to having only the half-solution that's in our specs now.

I wholeheartedly agree. 

Pat

> 
>      -- Sandro
> 
> 
> 
> 
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

Received on Monday, 3 June 2013 20:56:21 UTC