Re: B-scopes from Pierre-Antoine Champin on 2012-11-19 (public-rdf-wg@w3.org from November 2012)

From: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
Date: Mon, 19 Nov 2012 19:32:04 +0100
To: Richard Cyganiak <richard@cyganiak.de>
Cc: RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <CA+OuRR_mCTNQySWpSV2OjjdJaJgL7gn8NH=YNN5BjGGoxPS5mA@mail.gmail.com>
Richard, all,

I like the current proposal in its current state, with a few minor comments:

* I would put "scope" in bold face when it is first used (in the definition
of b-node) rather than in the following paragraph; especially because it
gives the impression at first sight that scopes are defined by documents.
The following sentence explains that there may be other kinds of scopes,
but stil...

* I would rephrase the definition of "fresh" as follows

[[In a given scope, a **fresh blank node** is a blank node with a blank
node identifier that is new and unique within that scope.]]

* In order to keep a nice "definition" style to the whole, I would

* At the end of the paragraph about "copying into a scope", I would
explicitly state that parsing and serializing are two typical cases of
copying a graph into a new scope.

* I would move the definition of **merge** inside the note, as it is not so
much a definition than a logical consequence of the definition of "copying
into a scope"

  pa


On Mon, Nov 19, 2012 at 1:39 PM, Richard Cyganiak <richard@cyganiak.de>wrote:

> Hi Antoine,
>
> Summary:
>
> 1. Either you don't understand my proposal, or you're wilfully ignoring
> parts of it.
>
> 2. You need to actually read the parts of the Semantics document that I
> explicitly pointed out to you in order to show that you are wrong.
>
> 3. You need to explain how representing different existentially quantified
> variables by the same abstract syntax construct isn't a horrible kludge.
>
> Details inline.
>
> On 18 Nov 2012, at 06:20, Antoine Zimmermann wrote:
> > Le 17/11/2012 17:01, Richard Cyganiak a écrit :
> >> Hi Antoine,
> >>
> >> To be honest, I think your proposal makes the problem worse by
> >> deepening the disconnect between abstract syntax and semantics. See,
> >> the problem is this. Let's assume we have two Turtle files:
> >>
> >> _:x :name "Alice".
> >>
> >> And another one:
> >>
> >> _:x :name "Bob".
> >>
> >> They use the same token _:x. But we know that according to the
> >> semantics, they don't necessarily label the same thing; both files
> >> can be true even if there's nothing in the universe that has both of
> >> the names "Alice" and "Bob".
> >
> > The formal semantics never refers to bnode identifiers. What the token
> _:x labels is defined by the Turtle spec.
> >
> > Let us avoid the concrete RDF syntaxes for now and stick to maths.
> > Your two files serialise two graphs but it's not possible to know what
> bnodes are serialised.
>
> That is not true in my proposal.
>
> [[
> Every RDF document forms its own, self-contained scope for blank nodes.
> ]]
>
> Two documents -- two scopes -- two different blank nodes.
>
> > It is not known whether b1 = b2 or not. Yet, RDF Concepts says:
> >
> > "Given two blank nodes, it is possible to determine whether or not they
> are the same."
> >
> > In the current situation, it is not possible to do that.
>
> Yes it *is* possible, even in RDF 2004; the specs just don't specify how
> to do it. The missing bit of specification would have to say: “Within a
> scope (that is, within a file or system), blank nodes are the same if they
> have the same identifier. Between scopes (that is, between files or
> systems), blank nodes are different.”
>
> My proposal adds that missing bit of spec text.
>
> > Now, replacing a bnode in a graph by another bnode does not change
> anything to what's asserted, so let us replace b1 and b2 by b. Then
> consider:
> >
> > G1' = {(b,:name,"Alice")}
> > G2' = {(b,:name,"Bob")}
> >
> > There you have exactly the same bnode in both graphs. Yet, nothing has
> changed.
>
> “Nothing has changed” is not true. The meaning of the individual graphs
> hasn't changed, but the meaning of their union has changed. Before, it
> didn't matter where the quantifiers are located. Now you've moved the
> quantifiers to just outside each graph.
>
> > The problem is, in RDF 2004, it is not possible to convey this situation
> in any syntaxes.
>
> How exactly is this a problem?
>
> > Here comes your design, and mine, into the picture. In your design, the
> situation would be:
> >
> > G1 = {((x,scope1),:name,"Alice")}
> > G2 = {((x,scope2),:name,"Bob")}
> >
> > in the case the nodes are not the same, and:
> >
> > G1' = {((x,scope),:name,"Alice")}
> > G2' = {((x,scope),:name,"Bob")}
> >
> > in the case the nodes are the same. In my design, instead of being
> unable to determine whether the graphs serialised are {G1,G2} or {G1',G2'},
>
> [[
> Every RDF document forms its own, self-contained scope for blank nodes.
> ]]
>
> Therefore, the serialised graphs in my proposal are G1 and G2. Different
> documents, different scopes. The G1',G2' situation is logically impossible
> in my proposal. If you want global scope for your blank nodes, skolemize
> them.
>
> > or even other pairs, it would be known that the Turtle documents
> serialise the following graphs:
> >
> > H1 = {(bx,:name,"Alice")}
> > H2 = {(bx,:name,"Bob")}
> >
> > where bx is *the* bnode with label "x".
> >
> > I do not see how it can be worse to better know the situation.
>
> In your proposal, you don't know the situation any better -- you only
> think that because you misunderstand what my proposal says.
>
> Also, this *still* doesn't say explicitly whether the *quantifier* is just
> around each graph or global.
>
> Also, you keep ignoring the big flaw of your proposal -- that things which
> are different in the semantics shouldn't be treated as the same in the
> abstract syntax. The bx in your H1 and H2 are different existential
> variables in the semantics, therefore they should be different blank nodes
> in the abstract syntax.
>
> >> So the Turtle syntax uses the same token to indicate two possibly
> >> different things. How do you explain that?
> >
> > Indeed, that's bad. The token may or may not indicate different bnodes
> and no one can know (until the next design).
>
> Yeah, and that's what my proposal fixes. Your proposal “fixes” that
> problem by introducing a new one — different  variables being represented
> by the same abstract syntax construct.
>
> >> The 2004 account handwaves around the issue by saying that _:x is
> >> just a local label, and leaving the question open  whether they label
> >> the same or different things in the abstract syntax. So these two
> >> files may or may not serialize the same blank node. Then the
> >> semantics explains that even if it's the same blank node, if it comes
> >> from different places then we need to do a merge, and that creates
> >> different blank nodes.
> >
> > We actually do not need to do a merge anymore than if I have two
> integers, say the populations of states, from different places, I do not
> need to add them. Also, merge does not "create" different bnodes, anymore
> than addition create different integers.
>
> I have no idea what you are trying to say here.
>
> > There is no handwaving in my design, and it only refers to the concepts
> and abstract syntax, not to "files"
>
> Your design necessarily inherits the handwaving that RDF 2004 Semantics is
> doing on the issue:
>
> [[
> This effectively treats all blank nodes as having the same meaning as
> existentially quantified variables in the RDF graph in which they occur,
> and which have the scope of the entire graph. In terms of the N-Triples
> syntax, this amounts to the convention that would place the quantifiers
> just outside, or at the outer edge of, the N-Triples document corresponding
> to the graph. This in turn means that there is a subtle but important
> distinction in meaning between the operation of forming the union of two
> graphs and that of forming the merge. The simple union of two graphs
> corresponds to the conjunction ( 'and' ) of all the triples in the graphs,
> maintaining the identity of any blank nodes which occur in both graphs.
> This is appropriate when the information in the graphs comes from a single
> source, or where one is derived from the other by means of some valid
> inference process, as for example when applying an inference rule to add a
> triple to a graph. Merging two graphs treats the blank nodes in each graph
> as being existentially quantified in that graph, so that no blank node from
> one graph is allowed to stray into the scope of the other graph's
> surrounding quantifier. This is appropriate when the graphs come from
> different sources and there is no justification for assuming that a blank
> node in one refers to the same entity as any blank node in the other.
> ]]
> http://www.w3.org/TR/rdf-mt/#unlabel
>
> How is that not handwaving? Introducing the notion of “documents” and
> “sources” in Semantics strikes me as the completely wrong place. This
> belongs in Concepts, and that's what my proposal achieves.
>
> > and a notion of "scope" that I still don't really see a formalisation
> of. To get the notion of scope into the abstract syntax, you need a strict
> formalisation of it, IMO.
>
> Why?
>
> >> You *cannot* produce a coherent account of the simple two-file
> >> situation shown above without talking about scopes or sources
> >> *somewhere*.
> >
> > RDF 2004 does not do that
>
> You've got to be kidding me.
>
> >> Currently, that talk is hidden away in Semantics where
> >> most people won't see it (last paragraph of [2]).
>
> You see that [2] there? You didn't click it, right? Because you've read
> the entire document years ago, right? And you don't remember it mentioning
> document scopes or sources anywhere, right? Therefore, you didn't need to
> click on that link, right?
>
> >>> (id,scope) is just a complicated way of defining a globally unique
> >>> identifier for a bnode.
> >>
> >> That is nonsense. It's an explicit way of saying that id is not
> >> globally unique.
> >
> > By your definition, a bnode is a pair (id,scope).
>
> It is not.
>
> http://www.w3.org/2011/rdf-wg/wiki/User:Rcygania2/B-Scopes#Specification_Changes
>
> > This pair belongs to the unique (so that it is "shared" by everyone, or
> "global") set of pairs {(i,s)|i is a UNICODE string, s a scope}.
>
> Well, those pairs may be unique, but that still doesn't make them
> identifiers, because identifiers are names, and scopes are not names.
>
> I use the word “scope” in its standard computer science usage: The scope
> of an identifier is the context in which it has its meaning.
>
> [[
> Every RDF document forms its own, self-contained scope for blank nodes.
> The handling of scopes outside of RDF documents (for example, in RDF
> stores) is implementation-dependent. Other specifications MAY impose
> additional scoping rules.
> ]]
>
> Best,
> Richard
>
>
>
> >
> >
> > AZ
> >
> >>
> >> Best, Richard
> >>
> >>
> >> [1]
> >>
> http://www.w3.org/2011/rdf-wg/wiki/User:Rcygania2/B-Scopes#Specification_Changes
> >>
> >>
> > [2] http://www.w3.org/TR/rdf-mt/#unlabel
> >>
> >>
> >>
> >> On 17 Nov 2012, at 10:27, Antoine Zimmermann wrote:
> >>
> >>> I don't find this really useful, and even confusing. Like Andy, I
> >>> see this as an implementation approach.
> >>>
> >>> (id,scope) is just a complicated way of defining a globally unique
> >>> identifier for a bnode.
> >>>
> >>> What I would say instead is the following:
> >>>
> >>> """ Bnodes are drawn from an infinite set. Each bnode has a label
> >>> being a UNICODE string, different from all other bnode labels. So
> >>> when one draws a bnode, one can tell which bnode it is.
> >>> Serialisation syntaxes that rely on bnode identifiers are in fact
> >>> identifying the exact bnode they use. """
> >>>
> >>> And everything stays the same. No other changes are required.
> >>>
> >>> It especially clarifies what this sentence means:
> >>>
> >>> """ Given two blank nodes, it is possible to determine whether or
> >>> not they are the same. """
> >>>
> >>> In RDF 2004, this sentence was never really implemented anywhere.
> >>> If you got a bunch of triples, then another bunch of triples, you
> >>> could not say which bnode of the first bunch were the same or
> >>> different as the bnodes of the second bunch.
> >>>
> >>> There are cases when you want to split a graph into subgraphs, in
> >>> which case you must know what bnodes actually appear in each
> >>> subgraph. To get back the full graph from the subgraphs, it is
> >>> required that you use set union, not merge. This requires that the
> >>> bnodes are all identified in the same way across the subgraphs.
> >>>
> >>> Notice that a bnode label does not denote anything in terms of the
> >>> formal semantics, so it has nothing to do with an IRI, and nothing
> >>> to do with a skolem IRI. The label is only there to tell which
> >>> bnode is used. It's an existential variable name and it can be
> >>> replaced by any other variable name without changing the meaning of
> >>> a graph.
> >>>
> >>> Fresh bnode may be defined formally as follows:
> >>>
> >>> """ Given a set of RDF graphs Sg, the triples of which containing a
> >>> set Sb of bnodes, a fresh bnode with respect to Sg is a bnode b not
> >>> in Sb. """
> >>>
> >>> Of course, when we say "new", it has to be new wrt something
> >>> predefined, thus the notion of "fresh bnode with respect to a set
> >>> of RDF graphs".
> >>>
> >>>
> >>> --AZ
> >>>
> >>>
> >>>
> >>> Le 14/11/2012 12:02, Richard Cyganiak a écrit :
> >>>> Following recent discussions, I've written up a proposal to
> >>>> change the design of blank nodes in RDF by explicitly introducing
> >>>> scoped blank node identifiers into the abstract syntax.
> >>>>
> >>>> http://www.w3.org/2011/rdf-wg/wiki/User:Rcygania2/B-Scopes
> >>>>
> >>>> Requirements:
> >>>>
> >>>> • Consistency with all resolutions the WG has made so far • No
> >>>> changes to other specs beyond Concepts and Semantics required •
> >>>> No changes to conforming implementations required
> >>>>
> >>>> All further details are in the wiki.
> >>>>
> >>>> Comments welcome.
> >>>>
> >>>> Best, Richard
> >>>>
> >>>
> >>>
> >>> -- Antoine Zimmermann ISCOD / LSTI - Institut Henri Fayol École
> >>> Nationale Supérieure des Mines de Saint-Étienne 158 cours Fauriel
> >>> 42023 Saint-Étienne Cedex 2 France Tél:+33(0)4 77 42 66 03
> >>> Fax:+33(0)4 77 42 66 66 http://zimmer.aprilfoolsreview.com/
> >>>
> >>
> >>
> >>
> >
> >
> > --
> > Antoine Zimmermann
> > ISCOD / LSTI - Institut Henri Fayol
> > École Nationale Supérieure des Mines de Saint-Étienne
> > 158 cours Fauriel
> > 42023 Saint-Étienne Cedex 2
> > France
> > Tél:+33(0)4 77 42 66 03
> > Fax:+33(0)4 77 42 66 66
> > http://zimmer.aprilfoolsreview.com/
> >
>
>
>
Received on Monday, 19 November 2012 18:32:34 UTC