Re: B-scopes from Richard Cyganiak on 2012-11-19 (public-rdf-wg@w3.org from November 2012)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Mon, 19 Nov 2012 12:39:58 +0000
To: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Cc: RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <53E5523C-47D2-4494-AC0D-2CD6204C2694@cyganiak.de>
Hi Antoine,

Summary:

1. Either you don't understand my proposal, or you're wilfully ignoring parts of it.

2. You need to actually read the parts of the Semantics document that I explicitly pointed out to you in order to show that you are wrong.

3. You need to explain how representing different existentially quantified variables by the same abstract syntax construct isn't a horrible kludge.

Details inline.

On 18 Nov 2012, at 06:20, Antoine Zimmermann wrote:
> Le 17/11/2012 17:01, Richard Cyganiak a écrit :
>> Hi Antoine,
>> 
>> To be honest, I think your proposal makes the problem worse by
>> deepening the disconnect between abstract syntax and semantics. See,
>> the problem is this. Let's assume we have two Turtle files:
>> 
>> _:x :name "Alice".
>> 
>> And another one:
>> 
>> _:x :name "Bob".
>> 
>> They use the same token _:x. But we know that according to the
>> semantics, they don't necessarily label the same thing; both files
>> can be true even if there's nothing in the universe that has both of
>> the names "Alice" and "Bob".
> 
> The formal semantics never refers to bnode identifiers. What the token _:x labels is defined by the Turtle spec.
> 
> Let us avoid the concrete RDF syntaxes for now and stick to maths.
> Your two files serialise two graphs but it's not possible to know what bnodes are serialised.

That is not true in my proposal.

[[
Every RDF document forms its own, self-contained scope for blank nodes.
]]

Two documents -- two scopes -- two different blank nodes.

> It is not known whether b1 = b2 or not. Yet, RDF Concepts says:
> 
> "Given two blank nodes, it is possible to determine whether or not they are the same."
> 
> In the current situation, it is not possible to do that.

Yes it *is* possible, even in RDF 2004; the specs just don't specify how to do it. The missing bit of specification would have to say: “Within a scope (that is, within a file or system), blank nodes are the same if they have the same identifier. Between scopes (that is, between files or systems), blank nodes are different.”

My proposal adds that missing bit of spec text.

> Now, replacing a bnode in a graph by another bnode does not change anything to what's asserted, so let us replace b1 and b2 by b. Then consider:
> 
> G1' = {(b,:name,"Alice")}
> G2' = {(b,:name,"Bob")}
> 
> There you have exactly the same bnode in both graphs. Yet, nothing has changed.

“Nothing has changed” is not true. The meaning of the individual graphs hasn't changed, but the meaning of their union has changed. Before, it didn't matter where the quantifiers are located. Now you've moved the quantifiers to just outside each graph.

> The problem is, in RDF 2004, it is not possible to convey this situation in any syntaxes.

How exactly is this a problem?

> Here comes your design, and mine, into the picture. In your design, the situation would be:
> 
> G1 = {((x,scope1),:name,"Alice")}
> G2 = {((x,scope2),:name,"Bob")}
> 
> in the case the nodes are not the same, and:
> 
> G1' = {((x,scope),:name,"Alice")}
> G2' = {((x,scope),:name,"Bob")}
> 
> in the case the nodes are the same. In my design, instead of being unable to determine whether the graphs serialised are {G1,G2} or {G1',G2'},

[[
Every RDF document forms its own, self-contained scope for blank nodes.
]]

Therefore, the serialised graphs in my proposal are G1 and G2. Different documents, different scopes. The G1',G2' situation is logically impossible in my proposal. If you want global scope for your blank nodes, skolemize them.

> or even other pairs, it would be known that the Turtle documents serialise the following graphs:
> 
> H1 = {(bx,:name,"Alice")}
> H2 = {(bx,:name,"Bob")}
> 
> where bx is *the* bnode with label "x".
> 
> I do not see how it can be worse to better know the situation.

In your proposal, you don't know the situation any better -- you only think that because you misunderstand what my proposal says.

Also, this *still* doesn't say explicitly whether the *quantifier* is just around each graph or global.

Also, you keep ignoring the big flaw of your proposal -- that things which are different in the semantics shouldn't be treated as the same in the abstract syntax. The bx in your H1 and H2 are different existential variables in the semantics, therefore they should be different blank nodes in the abstract syntax.

>> So the Turtle syntax uses the same token to indicate two possibly
>> different things. How do you explain that?
> 
> Indeed, that's bad. The token may or may not indicate different bnodes and no one can know (until the next design).

Yeah, and that's what my proposal fixes. Your proposal “fixes” that problem by introducing a new one — different  variables being represented by the same abstract syntax construct.

>> The 2004 account handwaves around the issue by saying that _:x is
>> just a local label, and leaving the question open  whether they label
>> the same or different things in the abstract syntax. So these two
>> files may or may not serialize the same blank node. Then the
>> semantics explains that even if it's the same blank node, if it comes
>> from different places then we need to do a merge, and that creates
>> different blank nodes.
> 
> We actually do not need to do a merge anymore than if I have two integers, say the populations of states, from different places, I do not need to add them. Also, merge does not "create" different bnodes, anymore than addition create different integers.

I have no idea what you are trying to say here.

> There is no handwaving in my design, and it only refers to the concepts and abstract syntax, not to "files"

Your design necessarily inherits the handwaving that RDF 2004 Semantics is doing on the issue:

[[
This effectively treats all blank nodes as having the same meaning as existentially quantified variables in the RDF graph in which they occur, and which have the scope of the entire graph. In terms of the N-Triples syntax, this amounts to the convention that would place the quantifiers just outside, or at the outer edge of, the N-Triples document corresponding to the graph. This in turn means that there is a subtle but important distinction in meaning between the operation of forming the union of two graphs and that of forming the merge. The simple union of two graphs corresponds to the conjunction ( 'and' ) of all the triples in the graphs, maintaining the identity of any blank nodes which occur in both graphs. This is appropriate when the information in the graphs comes from a single source, or where one is derived from the other by means of some valid inference process, as for example when applying an inference rule to add a triple to a graph. Merging two graphs treats the blank nodes in each graph as being existentially quantified in that graph, so that no blank node from one graph is allowed to stray into the scope of the other graph's surrounding quantifier. This is appropriate when the graphs come from different sources and there is no justification for assuming that a blank node in one refers to the same entity as any blank node in the other.
]]
http://www.w3.org/TR/rdf-mt/#unlabel

How is that not handwaving? Introducing the notion of “documents” and “sources” in Semantics strikes me as the completely wrong place. This belongs in Concepts, and that's what my proposal achieves.

> and a notion of "scope" that I still don't really see a formalisation of. To get the notion of scope into the abstract syntax, you need a strict formalisation of it, IMO.

Why?

>> You *cannot* produce a coherent account of the simple two-file
>> situation shown above without talking about scopes or sources
>> *somewhere*.
> 
> RDF 2004 does not do that

You've got to be kidding me.

>> Currently, that talk is hidden away in Semantics where
>> most people won't see it (last paragraph of [2]).

You see that [2] there? You didn't click it, right? Because you've read the entire document years ago, right? And you don't remember it mentioning document scopes or sources anywhere, right? Therefore, you didn't need to click on that link, right?

>>> (id,scope) is just a complicated way of defining a globally unique
>>> identifier for a bnode.
>> 
>> That is nonsense. It's an explicit way of saying that id is not
>> globally unique.
> 
> By your definition, a bnode is a pair (id,scope).

It is not.
http://www.w3.org/2011/rdf-wg/wiki/User:Rcygania2/B-Scopes#Specification_Changes

> This pair belongs to the unique (so that it is "shared" by everyone, or "global") set of pairs {(i,s)|i is a UNICODE string, s a scope}.

Well, those pairs may be unique, but that still doesn't make them identifiers, because identifiers are names, and scopes are not names.

I use the word “scope” in its standard computer science usage: The scope of an identifier is the context in which it has its meaning.

[[
Every RDF document forms its own, self-contained scope for blank nodes. The handling of scopes outside of RDF documents (for example, in RDF stores) is implementation-dependent. Other specifications MAY impose additional scoping rules.
]]

Best,
Richard



> 
> 
> AZ
> 
>> 
>> Best, Richard
>> 
>> 
>> [1]
>> http://www.w3.org/2011/rdf-wg/wiki/User:Rcygania2/B-Scopes#Specification_Changes
>> 
>> 
> [2] http://www.w3.org/TR/rdf-mt/#unlabel
>> 
>> 
>> 
>> On 17 Nov 2012, at 10:27, Antoine Zimmermann wrote:
>> 
>>> I don't find this really useful, and even confusing. Like Andy, I
>>> see this as an implementation approach.
>>> 
>>> (id,scope) is just a complicated way of defining a globally unique
>>> identifier for a bnode.
>>> 
>>> What I would say instead is the following:
>>> 
>>> """ Bnodes are drawn from an infinite set. Each bnode has a label
>>> being a UNICODE string, different from all other bnode labels. So
>>> when one draws a bnode, one can tell which bnode it is.
>>> Serialisation syntaxes that rely on bnode identifiers are in fact
>>> identifying the exact bnode they use. """
>>> 
>>> And everything stays the same. No other changes are required.
>>> 
>>> It especially clarifies what this sentence means:
>>> 
>>> """ Given two blank nodes, it is possible to determine whether or
>>> not they are the same. """
>>> 
>>> In RDF 2004, this sentence was never really implemented anywhere.
>>> If you got a bunch of triples, then another bunch of triples, you
>>> could not say which bnode of the first bunch were the same or
>>> different as the bnodes of the second bunch.
>>> 
>>> There are cases when you want to split a graph into subgraphs, in
>>> which case you must know what bnodes actually appear in each
>>> subgraph. To get back the full graph from the subgraphs, it is
>>> required that you use set union, not merge. This requires that the
>>> bnodes are all identified in the same way across the subgraphs.
>>> 
>>> Notice that a bnode label does not denote anything in terms of the
>>> formal semantics, so it has nothing to do with an IRI, and nothing
>>> to do with a skolem IRI. The label is only there to tell which
>>> bnode is used. It's an existential variable name and it can be
>>> replaced by any other variable name without changing the meaning of
>>> a graph.
>>> 
>>> Fresh bnode may be defined formally as follows:
>>> 
>>> """ Given a set of RDF graphs Sg, the triples of which containing a
>>> set Sb of bnodes, a fresh bnode with respect to Sg is a bnode b not
>>> in Sb. """
>>> 
>>> Of course, when we say "new", it has to be new wrt something
>>> predefined, thus the notion of "fresh bnode with respect to a set
>>> of RDF graphs".
>>> 
>>> 
>>> --AZ
>>> 
>>> 
>>> 
>>> Le 14/11/2012 12:02, Richard Cyganiak a écrit :
>>>> Following recent discussions, I've written up a proposal to
>>>> change the design of blank nodes in RDF by explicitly introducing
>>>> scoped blank node identifiers into the abstract syntax.
>>>> 
>>>> http://www.w3.org/2011/rdf-wg/wiki/User:Rcygania2/B-Scopes
>>>> 
>>>> Requirements:
>>>> 
>>>> • Consistency with all resolutions the WG has made so far • No
>>>> changes to other specs beyond Concepts and Semantics required •
>>>> No changes to conforming implementations required
>>>> 
>>>> All further details are in the wiki.
>>>> 
>>>> Comments welcome.
>>>> 
>>>> Best, Richard
>>>> 
>>> 
>>> 
>>> -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol École
>>> Nationale Supérieure des Mines de Saint-Étienne 158 cours Fauriel
>>> 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 66 03
>>> Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/
>>> 
>> 
>> 
>> 
> 
> 
> -- 
> Antoine Zimmermann
> ISCOD / LSTI - Institut Henri Fayol
> École Nationale Supérieure des Mines de Saint-Étienne
> 158 cours Fauriel
> 42023 Saint-Étienne Cedex 2
> France
> Tél:+33(0)4 77 42 66 03
> Fax:+33(0)4 77 42 66 66
> http://zimmer.aprilfoolsreview.com/
>
Received on Monday, 19 November 2012 12:40:29 UTC