Another way to do it (was: Re: Three alternative approaches for fixing the blank node scope problem.) from Pat Hayes on 2013-03-17 (public-rdf-wg@w3.org from March 2013)

From: Pat Hayes <phayes@ihmc.us>
Date: Sun, 17 Mar 2013 09:28:20 -0500
To: Ivan Herman <ivan@w3.org>
Message-Id: <A820E6AF-4D48-4E2F-BDA6-A83085A2886D@ihmc.us>
On Mar 16, 2013, at 7:35 AM, Ivan Herman wrote:

> I am not claiming I understand all the details, I will have to think about it. However, as a first reaction, I do not like #2. As you say for option (3), a Dataset (and a TriG) file defines a, ehem, single scope for bnodes; we have indeed agreed that graphs in a dataset may share bnodes (and I remember when this first came up I could not really match this with the intuition I had based on the 2004 Semantics). But if we accepted your option (2), this would mean something like having a containing graph containing all graphs in the dataset; I believe that would become very very confusing. So I would prefer not to go down that route.

Hmm. But we DO have the graph consisting of all the triples in the dataset. I mean, it is the set of all the triples in the graphs in the dataset. And that set exists, right? 

And we still need to refer to that grap... ahem, that set of triples, in all three cases, in order to define the notion of a complete graph, which we need in order to state the conditions under which the merge lemma holds. Although I guess we could say all this by referring to the set of triples without actually calling it a graph. Sigh. 

Hmm. Later thought gives another way to approach this whole issue, which avoids the scope talk altogether and doesn't mention scoping graphs or containing graphs. So this is version 4.

4. Say that an RDF graph is *complete* if, for every blank node, it either contains all triples that contain that blank node, or none of them.  

Every graph is a subgraph of a complete graph, every graph is graph-equivalent to a complete graph (replace the blank nodes with new ones), and the union of a set of complete graphs is equivalent to the set. Semantics of bnodes is unchanged, but there is a remark that it only gives sensible results for complete graphs. (Or, we simply restrict the semantics of bnodes to complete graphs, so an incomplete graph with blank nodes doesn't have a meaning (but a graph-equivalent version of it might, of course.) I like this second approach, but I don't hold out much hope of it getting past Antoine.)

The key here is that phrase "all triples". It now has to be interpreted literally: it means all the triples *anywhere in the universe*, not just all the triples in some containing graph. So if someone in Australia (or on a planet circling Alpha Centauri) uses a blank node from one of my graphs, then my graph isn't complete, even though I might not know about it. Of course this is nonsensical, but we can now make the point about scopes, using a different language. What we can say about surface syntaxes is that they must specify when graphs described by surface documents or structures are complete. And we specify that RDF/XML documents and Ntriples documents all describe complete graphs. 

I'm no longer sure what to say about datasets.

> At first glance, my preference is (3) but I should go back and (try to) understand Antoine's objections...

When you do, please explain them to me :-)

Pat

> 
> Thanks
> 
> Ivan
> 
> 
> On Mar 15, 2013, at 23:39 , Pat Hayes <phayes@ihmc.us> wrote:
> 
>> As this debate has gotten very intense and active, and it is just barely possible that some folk might not be following every little detail, and there have now been several proposals, let me summarize the proposals here. These are *alternatives*, note. We don't need to do more than one of them. They are all formally equivalent.  They all involve defining at least one new notion, indicated **thus** which probably should be in Concepts. Please keep in mind the distinction between blank nodes and bnodeIDs when reading this. 
>> 
>> 1. bscopes. (Idea developed in my ISWC 2009 talk, the simplest and most 'abstract' of the three.)
>> 
>> We introduce the idea of a **bscope**, and a relation **in** between blank nodes and bscopes. Every blank node is in exactly one bscope. (Or, we define a **function** sc( ) from blank nodes to bscopes.) A subgraph is *complete* when, if it contains a blank node, it contains all the triples which contain that blank node. 
>> 
>> An RDF graph is a set of triples **such that every blank node in the set is in a single bscope**. Every set of RDF triples is graph-equivalent to an RDF graph as defined here, so this is no limitation upon how RDF graphs can be formed. 
>> 
>> Surface RDF syntaxes are required to define how their bnodeIDs map into bscopes, ie exactly when two occurrences of a bnodeID identify a single blank node, and exactly when two identified blank nodes are in the same bscope. 
>> 
>> The merge of a set of graphs is a graph comprising the union of all the triples in a set of graph-equivalent graphs, with blank nodes from a new bscope. The merge lemma holds for sets of complete graphs. 
>> 
>> -----------
>> 
>> 2. containing graphs. (Idea in recent email.) This does not mention scopes explictly. 
>> 
>> An RDF graph is a set of triples.  Some RDF graphs are designated to be **containing graphs**. Two different containing graphs cannot contain (triples which contain) the same blank node. (But a subgraph of a containing graph may share a blank node with its containing graph. So if two graphs share a blank node, they must be both subgraphs of the same containing graph.)  Intuitively, a containing graph is all the triples which contain a given collection of blank nodes. *Complete* graphs are defined as above.
>> 
>> **Every RDF graph is a subgraph of a unique containing graph** (which might be the graph itself, of course.) Any set of triples, even ones which cross containing-graph boundaries, is graph-equivalent to a containing graph (just provide brand new blank nodes, not used anywhere else) so, again, this is not a limitation on how RDF graphs can be formed.
>> 
>> Surface RDF syntaxes are required to define how they specify containing graphs. For example, with our current decisions, the containing graph of a dataset is the union of the graphs in the dataset. 
>> 
>> The merge of a set of graphs is a containing graph comprising the union of all the triples in a set of graph-equivalent graphs, using new blank nodes. (Basically the union, but we allow blank nodes to be re-mapped in a 1:1 fashion in order to fit into a new containing graph.) The merge lemma holds for sets of complete graphs. 
>> 
>> ------------
>> 
>> 3. bnodeID syntactic scopes.  (This is in the current draft of Semantics, text reproduced here.  Scoped graphs as described here are the same as containing graphs as described in (2), and bscopes in (1) are what all the blank nodes identified by bnodeIDs in a single bnodeID scope are in. They all say the same thing, in different ways.)
>> 
>> Blank nodes may be identified in a surface (document) syntax for RDF using blank node identifiers. Each surface syntax must specify an unambiguous notion of the **scope** of such identifiers, such that any graph defined by this syntax will be inside a single scope. Two graphs not in the same scope do not share any blank nodes. Each combination of a blank node identifier and a surrounding scope is understood to define a unique blank node, local to the graphs described by the surface syntax. The same blank node identifier used in different scopes identifies a different blank node in each scope in which it occurs. 
>> 
>> Scope boundaries are defined by the surface syntax used to encode RDF. For example, in RDF/XML [RDF-SYNTAX-GRAMMAR] and NTriples [RDF-TESTCASES], the scope is defined by the document. In TriG, a syntax for RDF datastores, the scope is the entire datastore.
>> 
>> The set of all triples in a given scope is called a **scoped graph**. **Every RDF graph described by a surface syntax for RDF must be a subgraph of a scoped graph**.
>> 
>> An RDF graph is *complete* when, for every blank node in the graph, the graph contains all triples in the scoped graph which contain that blank node. 
>> 
>> Merging is taking the union. 
>> 
>> ------------
>> 
>> I hope this helps the WG in its deliberations. I suggest that the second version might be the least painful one for people to swallow, as it introduces the least extra formal machinery, and it allows neat phrasings such as "a subgraph considered as a containing graph" to indicate how graph boundaries are being treated in examples. 
>> 
>> 
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973   
>> 40 South Alcaniz St.              (850)202 4416   office
>> Pensacola                             (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> FOAF: http://www.ivan-herman.net/foaf.rdf
> 
> 
> 
> 
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Sunday, 17 March 2013 14:33:05 UTC