Re: blank node scope - ISSUE-107 - resolve as in Semantics - hopefully on 20 March from Pat Hayes on 2013-03-14 (public-rdf-wg@w3.org from March 2013)

From: Pat Hayes <phayes@ihmc.us>
Date: Wed, 13 Mar 2013 22:29:41 -0500
To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Cc: public-rdf-wg@w3.org, "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
Message-Id: <42A852CF-46F8-4797-B1C1-662AC81476BB@ihmc.us>
On Mar 13, 2013, at 11:32 AM, Antoine Zimmermann wrote:

> Please, could you show the mathematical definitions of all this. I do not understand what is a scope with the text of Semantics.

Make sure you have the latest version, as the text was tweaked last night to improve the clarity. In the form given there, it uses the idea of a syntactic scope for bnodeIDs. The notion of syntactic scope (the scope of a local variable or a local identifier, or a bound variable, in logic) is surely a common idea for any logician or computer scientist. (Which is why I thought it might be easier for most readers to define it this way.)

I will try to review the current proposal as succinctly as I can, but it does require some care to say it correctly, keeping the two levels distinct.

1. An RDF graph is a set of triples. (2004)

Implicit in this is that *any* set of triples can be viewed as being a graph. This includes 'silly' sets, such as a set of triples containing just one triple chosen at random from every RDF/XML document ever published, or the set of triples containing a URI which rhymes with "bong" when spoken in Icelandic. As this illustrates, not all *sets* of triples are RDF graphs that correspond to any actual RDF source or RDF document. 

2. RDF graphs can be expressed using an RDF surface syntax. (2004) 

3. In such a surface syntax, blank nodes may be represented by blank node identifiers (bnodeIDs) (2004)

4. Any RDF surface syntax MUST define the scope of bnodeIDs in that syntax. (New, but in fact almost universally assumed in practice since 2004.)

5. We require that (all the bnodeIDs used in defining the triples in) any graph described by such a surface syntax MUST be contained within a single scope. (New, but in fact capturing how RDF graphs are treated since 2004.) (But scopes may extend beyond a single graph, as they do in datasets.)

6. Two graphs described by documents with different scopes, or from sources defining different scopes, CANNOT share a blank node. (New, but often assumed since 2004, even if using a different terminology.)

7. The set of all triples in a given scope is called a scoped graph. (New definition)

8. Observation. In actual usage, what people mean when they say "RDF graph" is almost always a scoped graph, that is, a graph whose triples are described fully in a document or datastructure or source which defines its own bnodeID scope, so that bnodeIDs are local names in that document or datastructure or source. Any graph described by an RDF/XML or NTriples document, for example, is a scoped graph. In some cases, people refer to graphs which are subgraphs of a scoped graph. I do not know of any examples of anyone needing to consider a graph that is not a subset of a scoped graph. 

Now, you asked for a "mathematical" account of bnode scoping, and what I have given you above is an account which refers to syntactic matters in a surface syntax. Perhaps you don't feel this is  sufficiently mathematical. OK, I can do it purely mathematically, entirely at the abstract level, if you prefer. (This is taken from http://www.slideshare.net/PatHayes/blogic-iswc-2009-invited-talk, starting around slide 16.) 

We introduce a set of things called bscopes, and a relation called "in" between bnodes and bscopes. (This is a different notion of scope than the one I have been using until now, though they are very closely related. In the ISWC talk, I called them 'surfaces'.) Every bnode is in exactly one bscope (this is the first axiom). An RDF graph is a set of triples **such that every bnode in the set is in a single bscope** (that is the second axiom), and we can then say that the graph is in the bscope. (This sounds like it is an extra condition on the 2004 graph model, but its not, since the 2004 version simply does not mention bscopes.) We allow more than one graph to be in a bscope, but not for one graph to be split across bscopes. Two graphs in the same bscope might share a bnode, of course. The truth conditions refer to mappings on the bnodes in a bscope, as you would expect.

The definition of merge in this model is, we make copies. A *copy* of a graph G  is an equivalent graph G' in a different bscope. The merge of a set S of graphs is a graph comprising copies of all the graphs in S, all in a single bscope, with a 1:1 mapping between bnodes in S and bnodes in the merge. That's it. We can define scoped graph and complete graphs just as before, in the obvious ways. 

The connection between syntactic scopes and bscopes is that every syntactic scope for bnodeIDs is required to define a single bscope for the blank nodes identified by the bnodeIDs in the scope. In fact, you could (Richard's idea) *define* bnodes to be pairs of a bnodeID and a bscope, and then show that this satisfies the axioms; but that's not actually necessary, and it might be confusing. (Though it does show that the axioms can be satisfied, if that needs showing.) 

(I actually like the bscope idea better, but it would require us to slightly tweak the definition of RDF graph, which I suspect will be too large a pill for the WG to swallow, which is why I havn't tried to get them to swallow it.)

Detailed responses to your email below, in-line.


> I can see several intepretations:
> 
> 
> 1) there is a mapping s from the set of all blank nodes to the set of scopes (and what's a scope is not specified beyond that there is a set of them). So, given a bnode b, I can say what's its scope by s(b).

Yes. That is another way to express the bscope idea, above. (b in c) iff s(b)=c

> I am very much against this design

Can you say why? As it (1) requires no changes to the 2004 semantics (extensions, but no changes) (2) completely solves the issue we have with merging vs. unioning (3) does not change any entailments or truth-conditions (4) is very easy to describe and (5) apparently conforms better to the way RDF is actually used in practice, I would not be willing to give it up without seeing a very convincing argument against it. Your being against it does not, in itself, comprise such an argument.

> , but it's not clear that the ED of RDF 1.1 Semantics is rejecting this one (especially given the remarks that Pat made during previous discussions on the topic).
> I would object formally to such a design.

Do you have any technical objections? Can you say why you would object formally to this design?

> 2) scopes form a partition of the RDF triples, so a triple belong to a single scope. A set in the partition is a complete graph.
> The problem is that the union of two different complete graphs is not a complete graph.

> I don't like this design at all, although it is already much better than the first one.

Again, can you say why?

> 3) a scope corresponds to an RDF graph, and scopes can overlap

No, that completely throws away the entire point of having a scope. In any case, scopes *don't* overlap. If they did, there would be no way to know how to interpret a local variable.

> (mathematically, there is a mapping M from the set of scopes to the set of graphs). A graph in a scope is a scoped graph (or mathematically, there exists a scope s such that M(s) contains the graph). The set of triples in the graph of a scope form a complete graph (M(s) is a complete graph). Possibly, the set of complete graphs is closed under set union (so that the union of two complete graphs is still a complete graph).
> This would be much better, yet not completely up to my expectation

Can you say what it is that you expect here?

> , but there are indications that the chosen design in the current ED is not this one.

Indeed not. 

> There are probably other ways to interpret the current text.

Have you got the newest version? I find it hard to see how this text can be understood in any other than the intended way. 

> I would be curious to know what would be your respective formalisation, Peter and Pat, if you had to write it independently of one another. I had the impression, reading some of your emails, that your understanding of scope was different.
> 
> 
> In any case, I fail to understand why scope should have any consequences on the truth of a set of triples.

It doesn't. But it does provide a natural extent to define the existential bnode mapping on. A bnodeID is now exactly like an existential variable bound by a quantifier which extends over the scope (or, if you prefer, the bnode is the quantified variable, extending over the bscope; although this is a bit problematic,. and bnodes don't have any lexical form to bind. The best way to map abstract bnode syntax to logic is by using Piercian graphical syntax.) Which is exactly the intent of the original RDF design, in fact, but we couldn't state it with this degree of precision at the time. 

> Thus my plea to revert to the semantics of bnodes as in Semantics 2004.

There is no change to the truth-conditions of a set of triples. But we do require that the set is (described by a document all of whose bnodeIDs are inside a single scope) (In a single bscope), in order to apply the bnode semantic rules. 

Pat


> 
> If scope impacts the semantics at all, then there should be a separate definition of the truth of a scoped graphs, as opposed to the truth of a set of triples. Something like:
> 
> "A scoped graph G in scope s is true in interpretation I iff there exists a mapping A from the bnodes in s to resources of I such that [I+A](M(s)) is true, otherwise it's false."
> 
> Note that A is independent of the graph G, it only depends on the complete graph M(s).
> 
> 
> 
> AZ
> 
> 
> PS: this is a bit redundent with my complete review that will follow (tomorrow I hope).
> 
> 
> 
> Le 13/03/2013 16:57, Peter Patel-Schneider a écrit :
>> ISSUE-107 concerns what to do with blank nodes.  This includes cross-graph
>> blank node scopes.
>> 
>> The current draft of Semantics
>> https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-mt/index.html includes a
>> solution to blank node scoping.  I propose that this solution be adopted by
>> the WG as the result of the issue.
>> 
>> The basic idea is to introduce the notion of a blank node scope.  RDF
>> graphs within a single scope can share blank nodes, graphs not in the same
>> scope cannot!  This makes blank-node-renaming unnecessary during graph
>> merging.  (Of course, in a surface syntax, different blank nodes may have
>> the same b-node name, so these names may have to be changed when merging in
>> a particular syntax.)
>> 
>> For graphs not in the same scope, nothing changes.   For graphs in the same
>> scope not sharing blank nodes, nothing changes.
>> For graphs in the same scope sharing blank nodes, these blank nodes are
>> interpreted uniformly.
>> 
>> This last breaks a feature of RDF, that a set of graphs entails their
>> merge.  There is a new definition in Semantics (complete graphs) that shows
>> when this feature is retained.
>> 
>> 
>> 
>> This solution needs changes in Concepts, minimally introducing the notion
>> of a blank node scope, but maybe also talking about how blank node scope
>> can be determined by different surface syntaxes.
>> 
>> I suppose that there is also the issue of whether all the RDF graphs in a
>> dataset are always in the same blank node scope.  It may be that it is not
>> reasonable to say that this is the case, because datasets are already
>> sometimes used as if they do not share blank nodes.
>> 
>> 
>> 
>> peter
>> 
> 
> -- 
> Antoine Zimmermann
> ISCOD / LSTI - Institut Henri Fayol
> École Nationale Supérieure des Mines de Saint-Étienne
> 158 cours Fauriel
> 42023 Saint-Étienne Cedex 2
> France
> Tél:+33(0)4 77 42 66 03
> Fax:+33(0)4 77 42 66 66
> http://zimmer.aprilfoolsreview.com/
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 14 March 2013 03:30:24 UTC