Re: B-scopes from Antoine Zimmermann on 2012-11-20 (public-rdf-wg@w3.org from November 2012)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Tue, 20 Nov 2012 11:27:01 +0100
To: Richard Cyganiak <richard@cyganiak.de>
CC: RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <50AB5AF5.7020004@emse.fr>
<I may try to answer some of your claims later, but it take so much time>

The notion of scope is important as a concept in RDF, but it should not
be inherent to the abstract syntax. All of what you say show that you
tie this notion very much to RDF Document, not really to the abstract 
syntax. And yes, you're right those notions belong to RDF Concepts 
rather than RDF Semantics.

A bnode should just be something in a set, disjoint from the IRIs and
the literals. It would be better to make clearer what the set is, but
even when it's unspecified, it works at the level of abstract syntax and 
semantics. I simply propose to make the set more explicit by saying that 
you can identify uniquely a bnode of this set (which is a way of 
explaining how "it is possible to determine whether or not they are the 
same.")

The notion of scope comes into play with the notion of RDF documents. I
would rather say that:

"An RDF document is an RDF graph inside a scope"

which formally would mean that an RDF document is a pair (s,g) where:
  - s is a resource that we call the /scope/ of the document
  - g is an RDF graph

It is the fact that an RDF graph is placed inside a document that 
determine how to delimit the existential scope of a bnode, but the bnode 
does not have, by itself, a notion of scope. The same bnode put in 
different graphs will have different existential scope. That's what RDF 
semantics formalises.

If I consider a bnode b, and the following graphs:
  G1 = {(b,<p>,<o>)}
  G2 = {(<s>,<p>,b)}

G1 and G2 share a bnode. There's really no reason to talk about scope at
that level. But if I consider the RDF documents (in Turtle):

document1:
_:x  <p>  <o> .

document2:
<s>  <p>  _:x .

Then there is really no problem whether or not _:x is the same bnode in
document1 and document2, because the document defines a scope. The
bnodes really can be the same. But even in this case, the scope is not 
necessarily limited to a file. Two files together could form one scope. 
It's an implementation decision.

The semantics doesn't care about this notion of scope, and the abstract 
syntax shouldn't, in my opinion.

By the way, same bnode or not, the set {G1,G2} is logically equivalent 
to the merge of G1 and G2 that contains two different bnodes. The set 
{G1,G2} is also logically equivalent to some set {G1',G2'} where the 
bnodes of G1' are disjoint with the bnodes of G2'. So the fact that you 
have a global identifier for bnodes does not make any semantic different 
whatsoever, the identifier doesn't have a semantic value.


AZ

Le 19/11/2012 13:39, Richard Cyganiak a écrit :
> Hi Antoine,
>
> Summary:
>
> 1. Either you don't understand my proposal, or you're wilfully
> ignoring parts of it.
>
> 2. You need to actually read the parts of the Semantics document that
> I explicitly pointed out to you in order to show that you are wrong.
>
> 3. You need to explain how representing different existentially
> quantified variables by the same abstract syntax construct isn't a
> horrible kludge.
>
> Details inline.
>
> On 18 Nov 2012, at 06:20, Antoine Zimmermann wrote:
>> Le 17/11/2012 17:01, Richard Cyganiak a écrit :
>>> Hi Antoine,
>>>
>>> To be honest, I think your proposal makes the problem worse by
>>> deepening the disconnect between abstract syntax and semantics.
>>> See, the problem is this. Let's assume we have two Turtle files:
>>>
>>> _:x :name "Alice".
>>>
>>> And another one:
>>>
>>> _:x :name "Bob".
>>>
>>> They use the same token _:x. But we know that according to the
>>> semantics, they don't necessarily label the same thing; both
>>> files can be true even if there's nothing in the universe that
>>> has both of the names "Alice" and "Bob".
>>
>> The formal semantics never refers to bnode identifiers. What the
>> token _:x labels is defined by the Turtle spec.
>>
>> Let us avoid the concrete RDF syntaxes for now and stick to maths.
>> Your two files serialise two graphs but it's not possible to know
>> what bnodes are serialised.
>
> That is not true in my proposal.
>
> [[ Every RDF document forms its own, self-contained scope for blank
> nodes. ]]
>
> Two documents -- two scopes -- two different blank nodes.
>
>> It is not known whether b1 = b2 or not. Yet, RDF Concepts says:
>>
>> "Given two blank nodes, it is possible to determine whether or not
>> they are the same."
>>
>> In the current situation, it is not possible to do that.
>
> Yes it *is* possible, even in RDF 2004; the specs just don't specify
> how to do it. The missing bit of specification would have to say:
> “Within a scope (that is, within a file or system), blank nodes are
> the same if they have the same identifier. Between scopes (that is,
> between files or systems), blank nodes are different.”
>
> My proposal adds that missing bit of spec text.
>
>> Now, replacing a bnode in a graph by another bnode does not change
>> anything to what's asserted, so let us replace b1 and b2 by b. Then
>> consider:
>>
>> G1' = {(b,:name,"Alice")} G2' = {(b,:name,"Bob")}
>>
>> There you have exactly the same bnode in both graphs. Yet, nothing
>> has changed.
>
> “Nothing has changed” is not true. The meaning of the individual
> graphs hasn't changed, but the meaning of their union has changed.
> Before, it didn't matter where the quantifiers are located. Now
> you've moved the quantifiers to just outside each graph.
>
>> The problem is, in RDF 2004, it is not possible to convey this
>> situation in any syntaxes.
>
> How exactly is this a problem?
>
>> Here comes your design, and mine, into the picture. In your design,
>> the situation would be:
>>
>> G1 = {((x,scope1),:name,"Alice")} G2 = {((x,scope2),:name,"Bob")}
>>
>> in the case the nodes are not the same, and:
>>
>> G1' = {((x,scope),:name,"Alice")} G2' = {((x,scope),:name,"Bob")}
>>
>> in the case the nodes are the same. In my design, instead of being
>> unable to determine whether the graphs serialised are {G1,G2} or
>> {G1',G2'},
>
> [[ Every RDF document forms its own, self-contained scope for blank
> nodes. ]]
>
> Therefore, the serialised graphs in my proposal are G1 and G2.
> Different documents, different scopes. The G1',G2' situation is
> logically impossible in my proposal. If you want global scope for
> your blank nodes, skolemize them.
>
>> or even other pairs, it would be known that the Turtle documents
>> serialise the following graphs:
>>
>> H1 = {(bx,:name,"Alice")} H2 = {(bx,:name,"Bob")}
>>
>> where bx is *the* bnode with label "x".
>>
>> I do not see how it can be worse to better know the situation.
>
> In your proposal, you don't know the situation any better -- you only
> think that because you misunderstand what my proposal says.
>
> Also, this *still* doesn't say explicitly whether the *quantifier* is
> just around each graph or global.
>
> Also, you keep ignoring the big flaw of your proposal -- that things
> which are different in the semantics shouldn't be treated as the same
> in the abstract syntax. The bx in your H1 and H2 are different
> existential variables in the semantics, therefore they should be
> different blank nodes in the abstract syntax.
>
>>> So the Turtle syntax uses the same token to indicate two
>>> possibly different things. How do you explain that?
>>
>> Indeed, that's bad. The token may or may not indicate different
>> bnodes and no one can know (until the next design).
>
> Yeah, and that's what my proposal fixes. Your proposal “fixes” that
> problem by introducing a new one — different  variables being
> represented by the same abstract syntax construct.
>
>>> The 2004 account handwaves around the issue by saying that _:x
>>> is just a local label, and leaving the question open  whether
>>> they label the same or different things in the abstract syntax.
>>> So these two files may or may not serialize the same blank node.
>>> Then the semantics explains that even if it's the same blank
>>> node, if it comes from different places then we need to do a
>>> merge, and that creates different blank nodes.
>>
>> We actually do not need to do a merge anymore than if I have two
>> integers, say the populations of states, from different places, I
>> do not need to add them. Also, merge does not "create" different
>> bnodes, anymore than addition create different integers.
>
> I have no idea what you are trying to say here.
>
>> There is no handwaving in my design, and it only refers to the
>> concepts and abstract syntax, not to "files"
>
> Your design necessarily inherits the handwaving that RDF 2004
> Semantics is doing on the issue:
>
> [[ This effectively treats all blank nodes as having the same meaning
> as existentially quantified variables in the RDF graph in which they
> occur, and which have the scope of the entire graph. In terms of the
> N-Triples syntax, this amounts to the convention that would place the
> quantifiers just outside, or at the outer edge of, the N-Triples
> document corresponding to the graph. This in turn means that there is
> a subtle but important distinction in meaning between the operation
> of forming the union of two graphs and that of forming the merge. The
> simple union of two graphs corresponds to the conjunction ( 'and' )
> of all the triples in the graphs, maintaining the identity of any
> blank nodes which occur in both graphs. This is appropriate when the
> information in the graphs comes from a single source, or where one is
> derived from the other by means of some valid inference process, as
> for example when applying an inference rule to add a triple to a
> graph. Merging two graphs treats the blank nodes in each graph as
> being existentially quantified in that graph, so that no blank node
> from one graph is allowed to stray into the scope of the other
> graph's surrounding quantifier. This is appropriate when the graphs
> come from different sources and there is no justification for
> assuming that a blank node in one refers to the same entity as any
> blank node in the other. ]] http://www.w3.org/TR/rdf-mt/#unlabel
>
> How is that not handwaving? Introducing the notion of “documents” and
> “sources” in Semantics strikes me as the completely wrong place. This
> belongs in Concepts, and that's what my proposal achieves.
>
>> and a notion of "scope" that I still don't really see a
>> formalisation of. To get the notion of scope into the abstract
>> syntax, you need a strict formalisation of it, IMO.
>
> Why?
>
>>> You *cannot* produce a coherent account of the simple two-file
>>> situation shown above without talking about scopes or sources
>>> *somewhere*.
>>
>> RDF 2004 does not do that
>
> You've got to be kidding me.
>
>>> Currently, that talk is hidden away in Semantics where most
>>> people won't see it (last paragraph of [2]).
>
> You see that [2] there? You didn't click it, right? Because you've
> read the entire document years ago, right? And you don't remember it
> mentioning document scopes or sources anywhere, right? Therefore, you
> didn't need to click on that link, right?
>
>>>> (id,scope) is just a complicated way of defining a globally
>>>> unique identifier for a bnode.
>>>
>>> That is nonsense. It's an explicit way of saying that id is not
>>> globally unique.
>>
>> By your definition, a bnode is a pair (id,scope).
>
> It is not.
> http://www.w3.org/2011/rdf-wg/wiki/User:Rcygania2/B-Scopes#Specification_Changes
>
>
>> This pair belongs to the unique (so that it is "shared" by
>> everyone, or "global") set of pairs {(i,s)|i is a UNICODE string, s
>> a scope}.
>
> Well, those pairs may be unique, but that still doesn't make them
> identifiers, because identifiers are names, and scopes are not
> names.
>
> I use the word “scope” in its standard computer science usage: The
> scope of an identifier is the context in which it has its meaning.
>
> [[ Every RDF document forms its own, self-contained scope for blank
> nodes. The handling of scopes outside of RDF documents (for example,
> in RDF stores) is implementation-dependent. Other specifications MAY
> impose additional scoping rules. ]]
>
> Best, Richard
>
>
>
>>
>>
>> AZ
>>
>>>
>>> Best, Richard
>>>
>>>
>>> [1]
>>> http://www.w3.org/2011/rdf-wg/wiki/User:Rcygania2/B-Scopes#Specification_Changes
>>>
>>>
>>
>>>
[2] http://www.w3.org/TR/rdf-mt/#unlabel
>>>
>>>
>>>
>>> On 17 Nov 2012, at 10:27, Antoine Zimmermann wrote:
>>>
>>>> I don't find this really useful, and even confusing. Like Andy,
>>>> I see this as an implementation approach.
>>>>
>>>> (id,scope) is just a complicated way of defining a globally
>>>> unique identifier for a bnode.
>>>>
>>>> What I would say instead is the following:
>>>>
>>>> """ Bnodes are drawn from an infinite set. Each bnode has a
>>>> label being a UNICODE string, different from all other bnode
>>>> labels. So when one draws a bnode, one can tell which bnode it
>>>> is. Serialisation syntaxes that rely on bnode identifiers are
>>>> in fact identifying the exact bnode they use. """
>>>>
>>>> And everything stays the same. No other changes are required.
>>>>
>>>> It especially clarifies what this sentence means:
>>>>
>>>> """ Given two blank nodes, it is possible to determine whether
>>>> or not they are the same. """
>>>>
>>>> In RDF 2004, this sentence was never really implemented
>>>> anywhere. If you got a bunch of triples, then another bunch of
>>>> triples, you could not say which bnode of the first bunch were
>>>> the same or different as the bnodes of the second bunch.
>>>>
>>>> There are cases when you want to split a graph into subgraphs,
>>>> in which case you must know what bnodes actually appear in
>>>> each subgraph. To get back the full graph from the subgraphs,
>>>> it is required that you use set union, not merge. This requires
>>>> that the bnodes are all identified in the same way across the
>>>> subgraphs.
>>>>
>>>> Notice that a bnode label does not denote anything in terms of
>>>> the formal semantics, so it has nothing to do with an IRI, and
>>>> nothing to do with a skolem IRI. The label is only there to
>>>> tell which bnode is used. It's an existential variable name and
>>>> it can be replaced by any other variable name without changing
>>>> the meaning of a graph.
>>>>
>>>> Fresh bnode may be defined formally as follows:
>>>>
>>>> """ Given a set of RDF graphs Sg, the triples of which
>>>> containing a set Sb of bnodes, a fresh bnode with respect to Sg
>>>> is a bnode b not in Sb. """
>>>>
>>>> Of course, when we say "new", it has to be new wrt something
>>>> predefined, thus the notion of "fresh bnode with respect to a
>>>> set of RDF graphs".
>>>>
>>>>
>>>> --AZ
>>>>
>>>>
>>>>
>>>> Le 14/11/2012 12:02, Richard Cyganiak a écrit :
>>>>> Following recent discussions, I've written up a proposal to
>>>>> change the design of blank nodes in RDF by explicitly
>>>>> introducing scoped blank node identifiers into the abstract
>>>>> syntax.
>>>>>
>>>>> http://www.w3.org/2011/rdf-wg/wiki/User:Rcygania2/B-Scopes
>>>>>
>>>>> Requirements:
>>>>>
>>>>> • Consistency with all resolutions the WG has made so far •
>>>>> No changes to other specs beyond Concepts and Semantics
>>>>> required • No changes to conforming implementations required
>>>>>
>>>>> All further details are in the wiki.
>>>>>
>>>>> Comments welcome.
>>>>>
>>>>> Best, Richard
>>>>>
>>>>
>>>>
>>>> -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol
>>>> École Nationale Supérieure des Mines de Saint-Étienne 158 cours
>>>> Fauriel 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 66
>>>> 03 Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/
>>>>
>>>
>>>
>>>
>>
>>
>> -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol École
>> Nationale Supérieure des Mines de Saint-Étienne 158 cours Fauriel
>> 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 66 03
>> Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/
>>
>
>

-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Tuesday, 20 November 2012 10:27:39 UTC