Re: B-scopes from Antoine Zimmermann on 2012-11-18 (public-rdf-wg@w3.org from November 2012)

From: Antoine Zimmermann <antoine.zimmermann@emse.fr>
Date: Sun, 18 Nov 2012 07:20:12 +0100
To: Richard Cyganiak <richard@cyganiak.de>
CC: RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <50A87E1C.5050102@emse.fr>
Le 17/11/2012 17:01, Richard Cyganiak a écrit :
> Hi Antoine,
>
> To be honest, I think your proposal makes the problem worse by
> deepening the disconnect between abstract syntax and semantics. See,
> the problem is this. Let's assume we have two Turtle files:
>
> _:x :name "Alice".
>
> And another one:
>
> _:x :name "Bob".
>
> They use the same token _:x. But we know that according to the
> semantics, they don't necessarily label the same thing; both files
> can be true even if there's nothing in the universe that has both of
> the names "Alice" and "Bob".

The formal semantics never refers to bnode identifiers. What the token 
_:x labels is defined by the Turtle spec.

Let us avoid the concrete RDF syntaxes for now and stick to maths.
Your two files serialise two graphs but it's not possible to know what 
bnodes are serialised. Let us call b1 and b2 the bnodes serialised by 
the first and second files respectively. Then the two graphs are 
(abbreviating the IRIs):

  G1 = {(b1,:name,"Alice")}
  G2 = {(b2,:name,"Bob")}

It is not known whether b1 = b2 or not. Yet, RDF Concepts says:

"Given two blank nodes, it is possible to determine whether or not they 
are the same."

In the current situation, it is not possible to do that. So currently, 
implementations (or at least, concrete syntaxes) do not conform to this 
part of the spec.

Now, replacing a bnode in a graph by another bnode does not change 
anything to what's asserted, so let us replace b1 and b2 by b. Then 
consider:

  G1' = {(b,:name,"Alice")}
  G2' = {(b,:name,"Bob")}

There you have exactly the same bnode in both graphs. Yet, nothing has 
changed. The problem is, in RDF 2004, it is not possible to convey this 
situation in any syntaxes.

Here comes your design, and mine, into the picture. In your design, the 
situation would be:

  G1 = {((x,scope1),:name,"Alice")}
  G2 = {((x,scope2),:name,"Bob")}

in the case the nodes are not the same, and:

  G1' = {((x,scope),:name,"Alice")}
  G2' = {((x,scope),:name,"Bob")}

in the case the nodes are the same. In my design, instead of being 
unable to determine whether the graphs serialised are {G1,G2} or 
{G1',G2'}, or even other pairs, it would be known that the Turtle 
documents serialise the following graphs:

  H1 = {(bx,:name,"Alice")}
  H2 = {(bx,:name,"Bob")}

where bx is *the* bnode with label "x".

I do not see how it can be worse to better know the situation.


> So the Turtle syntax uses the same token to indicate two possibly
> different things. How do you explain that?

Indeed, that's bad. The token may or may not indicate different bnodes 
and no one can know (until the next design).


> The 2004 account handwaves around the issue by saying that _:x is
> just a local label, and leaving the question open  whether they label
> the same or different things in the abstract syntax. So these two
> files may or may not serialize the same blank node. Then the
> semantics explains that even if it's the same blank node, if it comes
> from different places then we need to do a merge, and that creates
> different blank nodes.

We actually do not need to do a merge anymore than if I have two 
integers, say the populations of states, from different places, I do not 
need to add them. Also, merge does not "create" different bnodes, 
anymore than addition create different integers.


> Your account says that the two _:x actually are global labels, and
> therefore *do* label the same blank node in the abstract syntax,
> saying that the two files actually share a blank node. Then you
> handwave around the issue one level further down by saying that the
> graphs came from different sources and therefore a merge is required
> and that would create different blank nodes.

There is no notion of sources in the abstract syntaxes, plus the only 
place where I talk about merge in my email is when I say that "to get 
back the full graph from the subgraphs, it is required that you use set 
union, not merge."


> My proposed account [1] says that the two _:x are local labels, and
> label different blank nodes, because they are in different files, and
> hence different scopes. No handwaving required!

There is no handwaving in my design, and it only refers to the concepts 
and abstract syntax, not to "files" and a notion of "scope" that I still 
don't really see a formalisation of. To get the notion of scope into the 
abstract syntax, you need a strict formalisation of it, IMO.


> You *cannot* produce a coherent account of the simple two-file
> situation shown above without talking about scopes or sources
> *somewhere*.

RDF 2004 does not do that, and it's "working". It is not required, at 
the graph level, to introduce bnode identifiers in the abstract syntax 
(as it was the case in 2004) but it is confusing and even problematic 
when it comes to defining data structures that have multiple graphs.


> Currently, that talk is hidden away in Semantics where
> most people won't see it (last paragraph of [2]). I want us to get
> that talk out of the way right there in Concepts where blank nodes
> are defined.
>
>> (id,scope) is just a complicated way of defining a globally unique
>> identifier for a bnode.
>
> That is nonsense. It's an explicit way of saying that id is not
> globally unique.

By your definition, a bnode is a pair (id,scope). This pair belongs to 
the unique (so that it is "shared" by everyone, or "global") set of 
pairs {(i,s)|i is a UNICODE string, s a scope}.


AZ

>
> Best, Richard
>
>
> [1]
> http://www.w3.org/2011/rdf-wg/wiki/User:Rcygania2/B-Scopes#Specification_Changes
>
>
[2] http://www.w3.org/TR/rdf-mt/#unlabel
>
>
>
> On 17 Nov 2012, at 10:27, Antoine Zimmermann wrote:
>
>> I don't find this really useful, and even confusing. Like Andy, I
>> see this as an implementation approach.
>>
>> (id,scope) is just a complicated way of defining a globally unique
>> identifier for a bnode.
>>
>> What I would say instead is the following:
>>
>> """ Bnodes are drawn from an infinite set. Each bnode has a label
>> being a UNICODE string, different from all other bnode labels. So
>> when one draws a bnode, one can tell which bnode it is.
>> Serialisation syntaxes that rely on bnode identifiers are in fact
>> identifying the exact bnode they use. """
>>
>> And everything stays the same. No other changes are required.
>>
>> It especially clarifies what this sentence means:
>>
>> """ Given two blank nodes, it is possible to determine whether or
>> not they are the same. """
>>
>> In RDF 2004, this sentence was never really implemented anywhere.
>> If you got a bunch of triples, then another bunch of triples, you
>> could not say which bnode of the first bunch were the same or
>> different as the bnodes of the second bunch.
>>
>> There are cases when you want to split a graph into subgraphs, in
>> which case you must know what bnodes actually appear in each
>> subgraph. To get back the full graph from the subgraphs, it is
>> required that you use set union, not merge. This requires that the
>> bnodes are all identified in the same way across the subgraphs.
>>
>> Notice that a bnode label does not denote anything in terms of the
>> formal semantics, so it has nothing to do with an IRI, and nothing
>> to do with a skolem IRI. The label is only there to tell which
>> bnode is used. It's an existential variable name and it can be
>> replaced by any other variable name without changing the meaning of
>> a graph.
>>
>> Fresh bnode may be defined formally as follows:
>>
>> """ Given a set of RDF graphs Sg, the triples of which containing a
>> set Sb of bnodes, a fresh bnode with respect to Sg is a bnode b not
>> in Sb. """
>>
>> Of course, when we say "new", it has to be new wrt something
>> predefined, thus the notion of "fresh bnode with respect to a set
>> of RDF graphs".
>>
>>
>> --AZ
>>
>>
>>
>> Le 14/11/2012 12:02, Richard Cyganiak a écrit :
>>> Following recent discussions, I've written up a proposal to
>>> change the design of blank nodes in RDF by explicitly introducing
>>> scoped blank node identifiers into the abstract syntax.
>>>
>>> http://www.w3.org/2011/rdf-wg/wiki/User:Rcygania2/B-Scopes
>>>
>>> Requirements:
>>>
>>> • Consistency with all resolutions the WG has made so far • No
>>> changes to other specs beyond Concepts and Semantics required •
>>> No changes to conforming implementations required
>>>
>>> All further details are in the wiki.
>>>
>>> Comments welcome.
>>>
>>> Best, Richard
>>>
>>
>>
>> -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol École
>> Nationale Supérieure des Mines de Saint-Étienne 158 cours Fauriel
>> 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 66 03
>> Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/
>>
>
>
>


-- 
Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
France
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66
http://zimmer.aprilfoolsreview.com/
Received on Sunday, 18 November 2012 06:20:48 UTC