Re: A different take on b-scopes (ISSUE-107) from Pat Hayes on 2012-11-24 (public-rdf-wg@w3.org from November 2012)

From: Pat Hayes <phayes@ihmc.us>
Date: Fri, 23 Nov 2012 16:10:08 -0800
To: Richard Cyganiak <richard@cyganiak.de>
Cc: Antoine Zimmermann <antoine.zimmermann@emse.fr>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <42D6CD45-4BBC-4550-9BDC-283384960814@ihmc.us>
Richard, 

I can't help observing that your original proposal was both simpler and managed to completely short-circuit all this interminable (and confusing/ed) debate. So I propose that we go back to that. In sum, we talk about scopes of bnode identifiers (not scopes of bnodes: that idea is a mule), and bnode identifiers, and we say that bnode is *defined* to be a pair <identifier, scope>.  This is perfectly compatible with the 2004 specs, which simply say that bnodes are in a set disjoint from URis and literals: this set of pairs is indeed so disjoint. These new bnodes are less arbitrary than allowed by the 2004 specs, but that arbitrariness was not useful and was conceptually harmful, or at any rate confusing. These new bnodes are exactly as arbitrary as required in order for their identification by identifiers to be meaningful, no more and no less. 

I claim that the idea of a scope *of an identifier* is so widely understood as to require no explanation, although a sketch can be provided if one wishes to do so. 

I gather that the objection made to this was that it was too implementation-dependent. If so (I missed that telecon) then I reject that claim. The idea refers only to "identifiers", not to any particular form that those identifiers might have, or to how they might be encoded in text. It allows for virtually any implementation stategy, ranging from Unicode strings of a certain form using document boundaries as scope markers to such devices as closed loops in a concept-graph graphic, or association tables linking numerical codes used as identiifers to addresses encoding scopes. It allows for scopes which have gaps in them or which are distributed across pieces of other documents, or just about any form at all. All that matters is that the notions of identifier and scope are provided. The uniqueness constraint (that a bnode occur in at most one scope) then does not need to be stated: it follows from the very definition of bnodes themselves. And there is no need to talk of bijections or 1:1 mappings between identifiers and blank nodes, all of which is puzzling since the set of blank nodes is defined to be arbitrary.

AFAIKS, the only idea that then needs to be defined is that of an identifier *actually occurring* in a scope. Not all identifiers actually occur in a given scope S; say that the set of those that do is the "used" set of S. (ThIs set is usually finite.) This might be labile (it is the only thing at is) and then "fresh" means not used (yet), and "adding" has to be explained in terms of updating the used set of the scope so as to use more identifiers. Everything else is permanent and mathematical. 

Seems to me that this completely resolves all the issue with virtually no extra work, only one new definition (which is in any case intuitively obvious) and is completely backward compatible with 2004. 

I think I will formally object to any account that is not this account, in fact :-)

Pat


On Nov 23, 2012, at 5:50 AM, Richard Cyganiak wrote:

> On 23 Nov 2012, at 07:19, Antoine Zimmermann wrote:
>> There are flaws in your proposal and you do not want to admit it.
> 
> There are major flaws in the 2004 specs. The question is not whether any given proposal is flawless. The question is whether it sufficiently improves on the status quo, and whether anyone can come up with something better. Most of the WG seems to believe that the answer to these two questions is “yes” and “no”, respectively.
> 
>> Let us go through it carefully and see what are the issues.
> 
> Fine.
> 
>> I put a proposal at the end that tries to keep as much of your text as I could.
>> 
>> """
>> A scope has an associated 1:1 mapping (bijection) between the set of all blank node identifiers and a set of blank nodes.
>> """
>> 
>> So, given a scope S, there is a mapping m(S) from all bnode IDs (UNICODE strings or a subset of it) to a set b(S). Since it is 1 to 1, we can also talk about the inverse mapping M(S) from b(S) to UNICODE.
>> 
>> """
>> Scopes are subject to the following rules:
>> - The sets of blank nodes in any two scopes are disjoint.
>> """
>> 
>> Is "a bnode in a scope" an element of b(S)? Let us assume it is, otherwise it is ISSUE-b1.
> 
> Yes it is. So, no ISSUE-b1.
> 
>> """
>> A fresh blank node is any blank node that is not yet used within its scope.
>> """
>> 
>> So, there are bnodes that are used in a scope. Let us call the set of used bnodes in S, u(S). Is u(S) a subset of b(S)? Probably, otherwise it is ISSUE-b2.
> 
> Yes it is. So, no ISSUE-b2.
> 
>> A fresh bnode is not in u(S), ok. Must it be in b(S)? Can it be in, say b(S')? This is ISSUE-b3.
> 
> The question doesn't make sense. If you presume that the scope of the blank node is S, then obviously the blank node is in b(S) and cannot be in b(S') since those sets are disjoint. If the blank node was in b(S') then its freshness would depend on u(S'). This is perfectly clear from the definition. So, no ISSUE-b3.
> 
>> Moreover, "is not yet" suggests that the set is subject to change. So it's mutable, probably.
> 
> Yes.
> 
>> """
>> An RDF graph is copied into a scope by replacing each blank node in the graph with a fresh blank node in the target scope.
>> """
>> 
>> Let us consider graph G and let us say it is copied into S. What does it mean? It seems that you are somehow defining a new graph G', isomorphic to G but with fresh bnodes of S, so bnodes that are not in u(S).
> 
> Yes.
> 
>> It looks like you are talking about "a copy of G to S", which would yield an RDF graph, but since you say "is copied into" it sounds more like a modification of a state. I don't know. This is ISSUE-b4.
> 
> I don't understand what the issue is. The sentence defines a technical term, “copying into a scope”. It says that it is something that is done to an RDF graph. The result is another RDF graph, because that's what you get when you replace blank nodes in an RDF graph with other blank nodes. It also changes state as it uses up some formerly fresh blank nodes. I don't think there is an issue.
> 
>> """
>> Occurrences of one blank node in multiple triples are all replaced with the same fresh blank node.
>> """
>> 
>> Well, I don't know why the notion of occurrence suddenly appear. We were talking about RDF graphs, not concrete representations. This is ISSUE-b5.
> 
> An RDF graph is a set of triples. A blank node can occur in multiple triples. This has nothing to do with concrete representations. This sentence was added in response to a comment from Andy who found the preceding sentence not clear enough. I honestly don't know what your problem is here. Do you not understand what the sentence says? If you understand it, then there's no issue.
> 
>> """
>> If none of the source's blank node identifiers are used in the target scope, copying into a scope can be achieved by simply re-using the same blank node identifiers in the new scope.
>> """
>> 
>> So now, there are used bnode IDs as well. Let us call them uid(S). So I guess this means that uid(S) is the set of bnode IDs that are mapped to u(S) via m(S). Which supports the idea that u(S) is a subset of b(S).
> 
> Sure.
> 
>> There is also a notion of source's bnode IDs. But bnode IDs only exists in scopes, not in RDF graphs.
> 
> Every blank node has a unique blank node identifier associated via the mapping. Hence it is possible to talk about “the blank node identifiers associated with the blank nodes in the source graph”. “The source's blank node identifiers” is just a shorthand for that.
> 
> (The assumption that every blank node is indeed in the b(S) of some scope S is not spelled out, admittedly. *That* is ISSUE-b1.)
> 
>> So I guess this means that a graph is "copied" *from* a scope into another scope.
> 
> Not necessarily. The blank nodes may come from different scopes. It doesn't matter to the definition.
> 
>> But what the hell does "re-using a bnode ID" mean in this context?
> 
> Every blank node has a unique blank node identifier. Re-using a blank node identifier means that the fresh blank node in the target scope that replaces a blank node in the source graph is associated the same identifier via their respective mappings.
> 
>> I'll try to figure it out: let us write b(G) the set of bnodes in an RDF graph G. Let us assume b(G) is in u(S). Then the quoted sentence could be reformulated in (take a breath):
>> 
>> << If uid(S) is disjoint with uid(S'), then the copy of G from S into S' is a graph G' such that the isomorphism from G to G' maps each bnode n of b(G) to a bnode n' of b(G') with M(S)(n) = M(S')(n'). >>
>> 
>> Phew! But this is wild guess so I'm not at sure I get it correctly. So let us call this ISSUE-b6.
> 
> Not quite.
> 
> [[ If uid(S) is disjoint with uid(S'), then the copy of G into S' is a graph G' such that the isomorphism from G to G' maps each bnode n of b(G) to a bnode n' of b(G') with M(S)(n) = M(S')(n'), where S is the unique scope whose b(S) contains n. ]]
> 
> Furthermore, this is not a part of the definition, but merely a statement of fact. The G' described here meets the requirements of the definition stated earlier. This is clear from the use of the language “can be achieved”.
> 
>> """
>> The merge of two RDF graphs is the result of copying both graphs into a target scope.
>> """
>> 
>> So, this looks like we are kind of generalising the notion of copy to a pair of RDF graphs (and therefrom, to an arbitrary set of graphs).
>> 
>> """
>> The result is a single graph
>> """
>> 
>> This suggests that the notion of copying is in fact defining what *a copy* is.
> 
> Not quite, as “copying” uses up fresh blank nodes.
> 
>> """
>> The result is a single graph where all blank nodes are in the same scope, and where any blank node identifiers that occurred in both input graphs have been replaced in order to avoid clashes.
>> """
>> 
>> I really don't understand what this is saying at all. What is the result exactly? This is ISSUE-b7.
> 
> Again, this is a statement of fact, not a definition. The definition was the sentence before. This now states two properties of the result. In what way 
> 
>> But anyway, "merge" shouldn't belong to this section. It is completely independent of the notion of scope and bnode ID.
> 
> That is not true. The whole *reason* why the notion of the merge exists is to deal with the scope of blank nodes. (It doesn't have to do with blank node *identifiers* on the face of it, that's right, but the proposal creates a notion of blank node scope as a side effect of defining a mapping between blank nodes and identifiers.)
> 
> As I have stated before in the thread, the definition may well end up in a different section, and that's now really getting into editorial micromanagement.
> 
>> Here is a concise definition:
>> 
>> << A merge of two RDF graphs G and G' is the union of two RDF graphs H and H' such that H and H' do not share blank nodes, H is isomorphic to G, and H' is isomorphic to G'. >>
>> 
>> ...and then specify that all merges of G and G' are isomorphic so that we can usually talk about "the" merge.
> 
> The *whole point* of the proposal was to be able to *not use* that definition!
> 
> The point is to make it so that a set of two graphs entails their union in the case that they're all in the same scope, so that specifications built on RDF don't need to distinguish merge and union. You copy data into your scope, and then all you need is union.
> 
>> ===========
>> 
>> Here is yet another proposal, with less formalism:
>> 
>> News assumption: I make everything immutable, just talk about sets and mappings.
>> 
>> """
>> A /blank node identifier/ is a Unicode string that identifies a blank node within some local context, called a scope. A scope has:
>> - an associated 1:1 mapping (bijection) between the set of all blank node identifiers and a set of blank nodes;
>> - and a finite set of /used blank nodes/, associated with their used blank node identifiers.
>> 
>> Scopes are subject to the following rules:
>> - the sets of blank nodes in the mappings of any two scopes are disjoint;
>> - every RDF document forms its own scope;
>> - scope boundaries outside of RDF documents (for example, in RDF stores) are implementation-dependent;
>> - other specifications MAY impose additional rules, including constraints on the syntax of a scope's blank node identifiers.
>> 
>> A /fresh blank node/ is any blank node that is not used within its scope.
>> 
>> An RDF graph is said to /belong to a scope/ if its bnodes are in the set that the scope maps to.
> 
> I don't think that's necessary.
> 
>> A /copy/ of an RDF graph into a (target) scope is an RDF graph that can be obtained by replacing the blank nodes of the source graph by fresh blank nodes in the target scope.
> 
> Given an input graph with 10 blank nodes, I can arbitrarily map those 10 to 2 in the new scope, and claim to have a copy.
> 
> The copies of two graphs A and B into scope S may share blank nodes according to your definition, which defeats the purpose. That's because you don't make fresh blank nodes non-fresh once they are used.
> 
>> """
>> 
>> And we may add:
>> 
>> """
>> A /concrete RDF graph/ is an RDF graph having its blank nodes identified by blank node identifiers in a known scope.
> 
> That may be a useful notion. I'd call it “well-scoped” or something like that, because it means that all its blank nodes are in the same scope.
> 
>> [[Note: copying a concrete RDF graph from its scope to another scope amounts to making a concrete RDF graph which contains unused identifiers.
> 
> What exactly is an unused identifier? Unused where?
> 
>> If the identifiers in the original concrete RDF graph are not used in the target scope, then the same identifiers can be used in the copy.]]
>> 
>> [[Note: <a href="definition-of-merge">Merging</a> can be understood as a copy operation, even though it is abstractly defined independently of scopes and blank node identifiers.]]
>> """
> 
> Again: The whole purpose of this exercise is to do away with the merge/union handwaving in RDF Semantics and to say explicitly how one correctly “combines” different RDF data.
> 
> R
> 
> 
> 
>> 
>> 
>> 
>> Best,
>> AZ.
>> 
>> 
>> Le 22/11/2012 13:12, Richard Cyganiak a écrit :
>>> Hi Antoine,
>>> 
>>> On 22 Nov 2012, at 09:28, Antoine Zimmermann wrote:
>>>> Yes, it's going in the right direction and I like it much better than
>>>> before. But still some issues: the proposal has some unsaid assumptions that makes it a bit sloppy.
>>>> 
>>>> 1. A scope is mutable. Bnodes id can be added to it, thus the notion of fresh bnodes;
>>> 
>>> No, a scope is not mutable. It's a bijection between *all* blank node identifiers and some set of blank nodes. “Using” a blank node stops it being fresh, but doesn't modify the scope.
>>> 
>>> Making scopes mutable means that now you will have people who ask how to delete a blank node from a scope, and you need to put constraints to stop people from re-assigning a blank node identifier to a different blank node. Let's *please* not go there.
>>> 
>>> (The reason you want mutability is because you don't like how “freshness” is defined. I know that the definition of “fresh” is mathematically sloppy, but it is perfectly comprehensible, therefore I object to making it more complicated just to please mathematical aesthetics. I like precision, but this is becoming formalism for the sake of formalism. I'm happy to change the definitions of “copy” and “merge” to something that is declarative and doesn't rely on “freshness” if anyone can propose wording that works.)
>>> 
>>>> 2. A scope is associated to an RDF graph, thus the notion of copying a graph into a scope, and merging towards a scope.
>>> 
>>> No, a scope is not associated to an RDF graph. The notions of copying and merging are really operations on sets of blank nodes, and not on graphs. It just so happens that the only sets of blank nodes that are ever interesting are those contained in a particular graph, hence we define the copy and merge of graphs, not the copy and merge of blank node sets. Scopes are associated with blank nodes, not graphs.
>>> 
>>> Most crucially, any number of graphs can be formed from the blank nodes in any given scope. For example, given a graph G whose blank nodes are all in scope S, the blank nodes of any subgraph of G are supposed to be still in S, but they can't in your proposal because it's now a different graph, hence different scope, hence disjoint set of blank nodes. Another example is RDF datasets: A TriG document, being an RDF document, is a scope and may contain many graphs.
>>> 
>>>> I had a hard time making sense of the two paragraphs before the note but here is a proposal. At some places it may be a bit too heavy in trying to be precise, so we can consider removing parts if accepted.
>>> 
>>> Well, all the detail is there because others complained that it wasn't precise enough.
>>> 
>>>> """
>>>> A /blank node identifier/ is a Unicode string that identifies a blank node within some local context, called a /scope/. A /scope/ is a mutable entity that comprises:
>>>> - a finite set of /blank node identifiers/;
>>>> - an RDF graph;
>>>> - a 1 to 1 mapping (bijection) between the set of identifiers and the set of blank nodes in the RDF graph.
>>>> 
>>>> Scopes are subject to the following constraints:
>>>> - in any state of affairs, different scopes map their identifiers to disjoint sets of blank nodes;
>>>> - every RDF document forms its own scope, where the RDF graph of the scope is the one serialised in the document;
>>>> - scope boundaries outside of RDF documents (for example, in RDF stores) are implementation-dependent;
>>>> - other specifications MAY impose additional rules, including constraints on the syntax of a scope's blank node identifiers.
>>>> 
>>>> If a scope maps a blank node identifier to a given blank node, the identifier is said to /identify/ the blank node. A blank node that is identified by a blank node identifier in a scope is said to /belong/ to the scope.
>>>> 
>>>> A /fresh blank node/ is a blank node that does not belong to any scope.
>>>> 
>>>> A /copy/ of a given RDF graph is an isomorphic RDF graph that only contains fresh blank nodes. An RDF graph is /copied into a scope/ by adding all the triples of a copy of the graph to the target scope's graph, and extending the mapping by introducing new identifiers mapped to the fresh nodes. If the given RDF graph belongs to a scope (its source), and none of the source's blank node identifiers are used in the target scope, copying into a scope can be achieved by simply re-using the same blank node identifiers in the new scope.
>>>> 
>>>> The merge of two RDF graphs can be obtained by copying both graphs into a target empty scope. In this case, the merge will be the target scope's RDF graph after the copies.
>>>> """
>>> 
>>> Thanks for taking the time to write this up. But I think it doesn't work, for the two reasons stated above: If you want mutability then you need to place constraints on it (and I doubt that you want mutability); and a blank node must be allowed to occur in any number of graphs (but only in one scope).
>>> 
>>>> Remark: in RDF 2004, merge is a math operation, so it does not involve changes of state, copy, etc. It's also a "semantic" operation, in the sense that the merge of a set of graphs is the only RDF graph (up to isomorphism) that is simple-equivalent to the set of graphs.
>>>> 
>>>> If we keep it this way in RDF 1.1, and I hope we do, then what concepts says about merge should not be presented as a definition but rather a way to *do* a merge. Thus, my words say "the merge can be obtained by etc."
>>> 
>>> RDF 2004 actually *defines* merge by saying “it is obtained by”.
>>> 
>>> The distinction you draw between “semantic” and “non-semantic” operations is spurious. If you want to draw such a distinction, it should be between operations that are defined with respect to an entailment regime, like entailment and equivalence and consistency. There is nothing particularly “semantic” about an operation or relationship that only holds in simple entailment.
>>> 
>>> (If the B-Scopes proposal is adopted, then merge and union, if used appropriately as described in RDF 2004, are equivalent anyway. Per RDF Semantics, the use of the merge is appropriate only when graphs come from different sources, and per the B-Scopes proposal, they have disjoint sets of blank nodes in that case. Hence the merge *is* the union. So we might just as well define the merge as *being* the union of two graphs, with a note saying that if you want a single set of blank node identifiers to uniquely refer to them, which you usually want in practice, then you need to copy that union into some scope; and another note pointing out that this was all a bit more complicated back in 2004.)
>>> 
>>> Best,
>>> Richard
>>> 
>>> 
>>> 
>>>> 
>>>> 
>>>> AZ
>>>> 
>>>> Le 22/11/2012 00:48, Richard Cyganiak a écrit :
>>>>> So here's a modified proposal. (The old one is still further down on
>>>>> the same page.)
>>>>> http://www.w3.org/2011/rdf-wg/wiki/User:Rcygania2/B-Scopes
>>>>> 
>>>>> What this does:
>>>>> 
>>>>> * Takes an old 2004-style definition of blank nodes * Adds a new
>>>>> subsection on “blank node identifiers and scopes” * Defines scopes
>>>>> more formally by saying that they have an associated “1:1 mapping
>>>>> (bijection) between blank node identifiers and blank nodes”
>>>>> 
>>>>> The goal was to make scopes an add-on to the definition of blank
>>>>> nodes, rather than baking them right into the definition. I may be
>>>>> wrong but that seemed to be at the heart of both Antoine's and Andy's
>>>>> concerns.
>>>>> 
>>>>> If this changes anyone's view of the whole thing (in a good or bad
>>>>> direction), then please comment.
>>>>> 
>>>>> The new proposal keeps the following bit, which Antoine and Andy may
>>>>> also have objected to, but which for me is the key sentence to the
>>>>> whole endeavour:
>>>>> 
>>>>> “The sets of blank nodes in any two scopes are disjoint.”
>>>>> 
>>>>> If you think that this sentence shouldn't be there, then I'd really
>>>>> like to hear the case argued, because I don't understand the reason
>>>>> for this objection.
>>>>> 
>>>>> Best, Richard
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> Antoine Zimmermann
>>>> ISCOD / LSTI - Institut Henri Fayol
>>>> École Nationale Supérieure des Mines de Saint-Étienne
>>>> 158 cours Fauriel
>>>> 42023 Saint-Étienne Cedex 2
>>>> France
>>>> Tél:+33(0)4 77 42 66 03
>>>> Fax:+33(0)4 77 42 66 66
>>>> http://zimmer.aprilfoolsreview.com/
>>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> -- 
>> Antoine Zimmermann
>> ISCOD / LSTI - Institut Henri Fayol
>> École Nationale Supérieure des Mines de Saint-Étienne
>> 158 cours Fauriel
>> 42023 Saint-Étienne Cedex 2
>> France
>> Tél:+33(0)4 77 42 66 03
>> Fax:+33(0)4 77 42 66 66
>> http://zimmer.aprilfoolsreview.com/
>> 
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Saturday, 24 November 2012 00:10:51 UTC