A new, hopefully more consensual, proposal for bscopes (Was: Three alternative approaches for fixing the blank node scope problem.) from Antoine Zimmermann on 2013-03-19 (public-rdf-wg@w3.org from March 2013)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Tue, 19 Mar 2013 18:01:52 +0100
To: public-rdf-wg@w3.org, Pat Hayes <phayes@ihmc.us>
Message-ID: <51489A00.3040308@emse.fr>
Pat,


This was helpful.

After revisiting all this, I think I can propose something that can be 
satisfying to me and that use a good part of what you propose. I 
pondered this idea of surface and was trying to provide a formalisation 
that I find appropriate.

The key to my proposal is that at any given point in time, there are a 
number of RDF triples that have a concrete realisation in the world (in 
files, in digital memory, on sheats of paper, on surfaces). We start by 
asserting that:
  1. RDF graphs are sets of triples;
  2. there is a distinguished subset of all triples that comprises the 
"concrete triples";
  3. each concrete triple exists in a single scope (i.e., equivalently, 
there is a partition of the concrete triples, where each set of the 
partition delimits a scope);
  4. it is possible to distinguish the bnodes in different scopes as 
they are physically placed differently, so we can assume that set of 
bnodes in a scope are disjoint from the ones in any other scope (this 
may not be completely necessary, but it's probably simpler this way);
  5. some bnodes are recognisable by an identifier (that can be an in 
memory structure or a handwritten symbol) that is used at multiple 
places: it is assumed that the same identifier in the same scope always 
indicates the same bnode, but always indicate different bnodes if placed 
in different scopes. This means that two physically distant occurrences 
of a symbol may refer to the same bnode.

Formally, we have the set of RDF triples T, a subset CT of concrete 
triples and a partition P(CT) of CT such that for each pairs of sets G1, 
G2 in the partition, G1 ≠ G2 => bnodes(G1) disjoint from bnodes(G2). The 
RDF graphs that form the partition are called the scopes of the concrete 
triples, and all subsets of them are called scoped graphs.

For each scope s, there is a labelling function l(s) that assigns 
identifiers to the bnodes appearing in s, such that l(s)(b) = l(s)(b') 
iff b = b'. Obviously, labelling functions of different scopes operate 
on disjoint sets of bnodes, so the same identifier on bnodes in 
different scopes does not indicate the same bnode.

An application, or a person, must be able to decide whether two 
occurrences of an identifier belong to the same scope, such that, if 
desired, they can unify them. This situation is conveniently addressed 
by the convention that the representation format (serialisation syntax, 
in-memory model, drawing convention, etc) provides on bnode identifiers.

These conventions allow agents to recognise the identity of bnodes, so 
they know what union is, for instance.
The problem is, the union of two graphs does not necessary correspond to 
a scope. In fact, with the formalisation above, it never corresponds to 
a scope. This is because concrete triples and their partition correspond 
to a /state/. Union of all sets of graphs exist at the abstract level. 
But we need a notion that actually builds a new graph and makes it 
"real" (makes the new graph a set of concrete triples).

A state comprises a set of concrete triples and a partition of that set 
in scopes. In a given state, the merge of two scoped graphs in different 
does not exist (yet). But there is always a merge of a set of scoped 
graphs made of triples that are not concrete. So it suffices to say that 
a "concrete merge" (as opposed to a merge of any abstract set of 
triples) of a set of scoped graphs {G1, ..., Gn} is a set of 
non-concrete triples that is equivalent to the union of the scoped 
graphs. So the merges of all possible sets of scoped graphs is formally 
defined, but these merges do not exist concretely.

This leads us to introduce operations that make transitions from states 
to states:

A /copy/ of a scoped graph G in state S produces a state S' such that 
all scopes of S are in S' and there is a single additional scope G' 
equivalent to G with only triples outside the concrete triples of S.

A "realisation" of a (concrete) merge of a set of scoped graphs in S 
produces a state S' which contains the scopes of S plus a single scope 
that is a "concrete merge" of the set.

We could as well defined delete, update, etc. but it may not be necessary.


AZ.


Le 15/03/2013 23:39, Pat Hayes a écrit :
> As this debate has gotten very intense and active, and it is just
> barely possible that some folk might not be following every little
> detail, and there have now been several proposals, let me summarize
> the proposals here. These are *alternatives*, note. We don't need to
> do more than one of them. They are all formally equivalent.  They all
> involve defining at least one new notion, indicated **thus** which
> probably should be in Concepts. Please keep in mind the distinction
> between blank nodes and bnodeIDs when reading this.
>
> 1. bscopes. (Idea developed in my ISWC 2009 talk, the simplest and
> most 'abstract' of the three.)
>
> We introduce the idea of a **bscope**, and a relation **in** between
> blank nodes and bscopes. Every blank node is in exactly one bscope.
> (Or, we define a **function** sc( ) from blank nodes to bscopes.) A
> subgraph is *complete* when, if it contains a blank node, it contains
> all the triples which contain that blank node.
>
> An RDF graph is a set of triples **such that every blank node in the
> set is in a single bscope**. Every set of RDF triples is
> graph-equivalent to an RDF graph as defined here, so this is no
> limitation upon how RDF graphs can be formed.
>
> Surface RDF syntaxes are required to define how their bnodeIDs map
> into bscopes, ie exactly when two occurrences of a bnodeID identify a
> single blank node, and exactly when two identified blank nodes are in
> the same bscope.
>
> The merge of a set of graphs is a graph comprising the union of all
> the triples in a set of graph-equivalent graphs, with blank nodes
> from a new bscope. The merge lemma holds for sets of complete
> graphs.

Let us consider a blank node b in a scope s (i.e., sc(b) = s). The 
triple (b, <p>, "a"^^xsd:integer) is in the scope s. This triple must be 
in the complete graph that contains the bnode b, by your definition.
So the complete graph having scope s is {xsd:integer}-inconsistent. As 
the choice of bnode is arbitrary, this is the case for all complete graph.


> -----------
>
> 2. containing graphs. (Idea in recent email.) This does not mention
> scopes explictly.
>
> An RDF graph is a set of triples.  Some RDF graphs are designated to
> be **containing graphs**. Two different containing graphs cannot
> contain (triples which contain) the same blank node. (But a subgraph
> of a containing graph may share a blank node with its containing
> graph. So if two graphs share a blank node, they must be both
> subgraphs of the same containing graph.)  Intuitively, a containing
> graph is all the triples which contain a given collection of blank
> nodes. *Complete* graphs are defined as above.
>
> **Every RDF graph is a subgraph of a unique containing graph** (which
> might be the graph itself, of course.) Any set of triples, even ones
> which cross containing-graph boundaries, is graph-equivalent to a
> containing graph (just provide brand new blank nodes, not used
> anywhere else) so, again, this is not a limitation on how RDF graphs
> can be formed.
>
> Surface RDF syntaxes are required to define how they specify
> containing graphs. For example, with our current decisions, the
> containing graph of a dataset is the union of the graphs in the
> dataset.
>
> The merge of a set of graphs is a containing graph comprising the
> union of all the triples in a set of graph-equivalent graphs, using
> new blank nodes. (Basically the union, but we allow blank nodes to be
> re-mapped in a 1:1 fashion in order to fit into a new containing
> graph.) The merge lemma holds for sets of complete graphs.

I don't see how this is an improvement compared to 2004-merge. In both 
cases, you have to change the bnodes you have in the input graphs. In 
both cases, the merge is not unique. In this case, and not in 
2004-merge, the merge lemma does not apply to all graphs.

... <some thought> ...

I may have found an idea that could make us both happier, I'll send a 
proposal in another email.


> ------------
>
> 3. bnodeID syntactic scopes.  (This is in the current draft of
> Semantics, text reproduced here.  Scoped graphs as described here are
> the same as containing graphs as described in (2), and bscopes in (1)
> are what all the blank nodes identified by bnodeIDs in a single
> bnodeID scope are in. They all say the same thing, in different
> ways.)
>
> Blank nodes may be identified in a surface (document) syntax for RDF
> using blank node identifiers. Each surface syntax must specify an
> unambiguous notion of the **scope** of such identifiers, such that
> any graph defined by this syntax will be inside a single scope. Two
> graphs not in the same scope do not share any blank nodes. Each
> combination of a blank node identifier and a surrounding scope is
> understood to define a unique blank node, local to the graphs
> described by the surface syntax. The same blank node identifier used
> in different scopes identifies a different blank node in each scope
> in which it occurs.
>
> Scope boundaries are defined by the surface syntax used to encode
> RDF. For example, in RDF/XML [RDF-SYNTAX-GRAMMAR] and NTriples
> [RDF-TESTCASES], the scope is defined by the document. In TriG, a
> syntax for RDF datastores, the scope is the entire datastore.
>
> The set of all triples in a given scope is called a **scoped graph**.
> **Every RDF graph described by a surface syntax for RDF must be a
> subgraph of a scoped graph**.
>
> An RDF graph is *complete* when, for every blank node in the graph,
> the graph contains all triples in the scoped graph which contain that
> blank node.
>
> Merging is taking the union.
>
> ------------
>
> I hope this helps the WG in its deliberations. I suggest that the
> second version might be the least painful one for people to swallow,
> as it introduces the least extra formal machinery, and it allows neat
> phrasings such as "a subgraph considered as a containing graph" to
> indicate how graph boundaries are being treated in examples.
>
>
> ------------------------------------------------------------ IHMC
> (850)434 8903 or (650)494 3973 40 South Alcaniz St.
> (850)202 4416   office Pensacola                             (850)202
> 4440   fax FL 32502                              (850)291 0667
> mobile phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>
>
>
>
>
>
>

-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Tuesday, 19 March 2013 17:02:16 UTC