Three alternative approaches for fixing the blank node scope problem. from Pat Hayes on 2013-03-15 (public-rdf-wg@w3.org from March 2013)

From: Pat Hayes <phayes@ihmc.us>
Date: Fri, 15 Mar 2013 17:39:30 -0500
To: RDF WG <public-rdf-wg@w3.org>
Message-Id: <DD38E70E-7EDC-4775-9D91-D7EC1007A1E3@ihmc.us>

As this debate has gotten very intense and active, and it is just barely possible that some folk might not be following every little detail, and there have now been several proposals, let me summarize the proposals here. These are *alternatives*, note. We don't need to do more than one of them. They are all formally equivalent. They all involve defining at least one new notion, indicated **thus** which probably should be in Concepts. Please keep in mind the distinction between blank nodes and bnodeIDs when reading this.

1. bscopes. (Idea developed in my ISWC 2009 talk, the simplest and most 'abstract' of the three.)

We introduce the idea of a **bscope**, and a relation **in** between blank nodes and bscopes. Every blank node is in exactly one bscope. (Or, we define a **function** sc( ) from blank nodes to bscopes.) A subgraph is *complete* when, if it contains a blank node, it contains all the triples which contain that blank node.

An RDF graph is a set of triples **such that every blank node in the set is in a single bscope**. Every set of RDF triples is graph-equivalent to an RDF graph as defined here, so this is no limitation upon how RDF graphs can be formed.

Surface RDF syntaxes are required to define how their bnodeIDs map into bscopes, ie exactly when two occurrences of a bnodeID identify a single blank node, and exactly when two identified blank nodes are in the same bscope.

The merge of a set of graphs is a graph comprising the union of all the triples in a set of graph-equivalent graphs, with blank nodes from a new bscope. The merge lemma holds for sets of complete graphs.

-----------

2. containing graphs. (Idea in recent email.) This does not mention scopes explictly.

An RDF graph is a set of triples. Some RDF graphs are designated to be **containing graphs**. Two different containing graphs cannot contain (triples which contain) the same blank node. (But a subgraph of a containing graph may share a blank node with its containing graph. So if two graphs share a blank node, they must be both subgraphs of the same containing graph.) Intuitively, a containing graph is all the triples which contain a given collection of blank nodes. *Complete* graphs are defined as above.

**Every RDF graph is a subgraph of a unique containing graph** (which might be the graph itself, of course.) Any set of triples, even ones which cross containing-graph boundaries, is graph-equivalent to a containing graph (just provide brand new blank nodes, not used anywhere else) so, again, this is not a limitation on how RDF graphs can be formed.

Surface RDF syntaxes are required to define how they specify containing graphs. For example, with our current decisions, the containing graph of a dataset is the union of the graphs in the dataset.

The merge of a set of graphs is a containing graph comprising the union of all the triples in a set of graph-equivalent graphs, using new blank nodes. (Basically the union, but we allow blank nodes to be re-mapped in a 1:1 fashion in order to fit into a new containing graph.) The merge lemma holds for sets of complete graphs.

------------

3. bnodeID syntactic scopes. (This is in the current draft of Semantics, text reproduced here. Scoped graphs as described here are the same as containing graphs as described in (2), and bscopes in (1) are what all the blank nodes identified by bnodeIDs in a single bnodeID scope are in. They all say the same thing, in different ways.)

Blank nodes may be identified in a surface (document) syntax for RDF using blank node identifiers. Each surface syntax must specify an unambiguous notion of the **scope** of such identifiers, such that any graph defined by this syntax will be inside a single scope. Two graphs not in the same scope do not share any blank nodes. Each combination of a blank node identifier and a surrounding scope is understood to define a unique blank node, local to the graphs described by the surface syntax. The same blank node identifier used in different scopes identifies a different blank node in each scope in which it occurs.

Scope boundaries are defined by the surface syntax used to encode RDF. For example, in RDF/XML [RDF-SYNTAX-GRAMMAR] and NTriples [RDF-TESTCASES], the scope is defined by the document. In TriG, a syntax for RDF datastores, the scope is the entire datastore.

The set of all triples in a given scope is called a **scoped graph**. **Every RDF graph described by a surface syntax for RDF must be a subgraph of a scoped graph**.

An RDF graph is *complete* when, for every blank node in the graph, the graph contains all triples in the scoped graph which contain that blank node.

Merging is taking the union.

------------

I hope this helps the WG in its deliberations. I suggest that the second version might be the least painful one for people to swallow, as it introduces the least extra formal machinery, and it allows neat phrasings such as "a subgraph considered as a containing graph" to indicate how graph boundaries are being treated in examples.

------------------------------------------------------------
IHMC (850)434 8903 or (650)494 3973
40 South Alcaniz St. (850)202 4416 office
Pensacola (850)202 4440 fax
FL 32502 (850)291 0667 mobile
phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes

Received on Friday, 15 March 2013 22:39:56 UTC