Re: Graphs and Being and Time from Nathan on 2011-02-24 (public-rdf-wg@w3.org from February 2011)

From: Nathan <nathan@webr3.org>
Date: Thu, 24 Feb 2011 04:07:33 +0000
To: public-rdf-wg@w3.org
Message-ID: <4D65D985.8020703@webr3.org>
Pat Hayes wrote:
> On Feb 23, 2011, at 4:38 PM, Nathan wrote:
>> Hi Pat,
>>
>> All I can say is that I believe you've hit on the distinction we can make between a Named Graph and a Graph Literal.
> 
> Have I? That wasn't what I was aiming to do, but maybe. (If so, I did so by accident, as I have to confess I have never really understood quite what a graph literal is. :-)

In my head at least :) I'll agree with Pierre-Antoine here, we need new 
terms for the two things - ones with no prior meaning attached - if we 
agree there are two things of course.

>> Where a Named Graph correlates to a "Graph Token" as you term it, a container for triples which can be given a name and accessed, the contents of the container may change over time, but the container remains the same, with the same name.
> 
> Well, exactly what is named by the name of a named graph is up for discussion, if we accept the distinction I was trying to promote. I do however think that treating it as naming a graph token/document is the most sensible idea, as you outline here.

I have to confess that "document" conjures up all the wrong things for 
me, in that if the same graph was serialized in two formats and offered 
via conneg, that they'd somehow be different graph-tokens, and to a 
lesser degree "token" makes me thing the same thing, that the lexical 
form of the serialization makes a distinction between the graphs. 
Perhaps that is what you meant though.

>> Where Graph Literal is an abstract graph, a set of triples, that which is given by a named graph when poked.
> 
> That doesn't sound like my (admittedly vague) understanding of what a graph literal is intended to be, but you may be right. 

The way I'm seeing it is as a "box of triples", where the box has a 
name, and the contents of that box at any point in time is a set of 
triples, it's value at the that time.

graph-token / named graph correlate to the box of triples in my 
understanding of things, and abstract-graph / set of triples / graph 
literal correlate to the contents of the box at a single time.

>> The "value" of a "named graph" at time t is a "graph literal".
>>
>> That "value" can be serialized in to a lexical form and passed in a representation (rdfa, rdf/xml json/rdf, a representation etc), held in memory or in a quad store or whatever.
>>
>> Now, assuming I understand/follow you correctly, my observation would be that we need both.
>>
>> As for blank nodes, that does worries me, because it appears that we have (or are acting like we have) three things now in RDF:
>>
>> - a "blank node" (something that merely exists, unnamed)
>> - a "local node" (something that merely exists within a certain scope or plain of existence, a locally scoped and named node)
>> - a "global node" (those things we currently say an IRI identifies!)
> 
> No, I think there is still basically only one notion of blank node, and its the same notion that RDF has always had. Its just that the current specs get kind of wooly and sketchy when they need to be quite precise about exactly what the 'scope' of a blank node is. (And they need to have this notion because the conceptual model of RDF allows the possibility of two different graphs sharing a blank node, which is kind of silly when you think about it.)

Sounds silly to me too, well worth fixing.

> As they say, you can think of a blank node as like an existentially bound variable, but they don't really specify where the quantifier is. And this has given rise to a whole lot of confusion, both inside and outside WG activities. All I was saying here was that we can use this 'token' idea to state the scoping rule more crisply. 

Yup I follow, the scoping definitely needs set, would scoping it at the 
graph token level mean that what it refers to would have to stay 
consistent over time?

>> My gut reaction to the above is that (if the above is true) things have become confused over time, primarily due to the existence of blank node identifiers - to me it makes far more sense to say:
> 
>> blank nodes are always unnamed and scoped to being within an abstract graph / graph literal - they /never/ have a blank node identifier.
> 
> We have to allow bnode identifiers in any 'linear' notation in order to keep track of which node is which. (If you can draw an actual 2-dimensional graph diagram, this need goes away, but you can't send a 2-D diagram over a byte stream.)

Agree, a temporary reference to those nodes is needed, but I can't see 
how it can work at graph-token level, I'm probably misunderstanding 
something here, does graph-token correlate to a box of triples as I've 
described it, something like a REST resource (temporally varying value), 
or something more like a document, a REST representation (non-temporally 
varying fixed value)?

> The rest of your message sounds like a proposal to change the way RDF uses IRIs. I don't want to go there, and I think it is beyond our charter. 

indeed it was, worth a quick try!

Best,

Nathan

> Pat
> 
> 
>> named nodes, where the name consists of two parts, an absolute-IRI which identifies the named graph, and a local name which identifies the node with that named graph, thus all RDF nodes would be both locally scoped within the graph, and "globally" named within the scope of the web. For example ( 'http://webr3.org/nathan' , 'me' ) the first part identifies the "named graph", the second part is the local name, concatenate the two with a '#' in the middle and you've got a full IRI which would be backwards compatible with RDF URI References.
>>
>> Does that all make sense? (to anybody else..)
>>
>> Best,
>>
>> Nathan
>>
>> Pat Hayes wrote:
>>> I would like the WG to rectify a basic flaw in the RDF conceptual model, or more exactly, how the basic RDF conceptual model is described in the RDF specs. This will bring the specs more in line with the way that RDF is actually used in practice. It will also not change the RDF semantics, though it will simplify their description a little. It is along the lines I suggested in my ISWC invited talk a while ago. As this will have consequences for the way we talk about RDF graphs generally, I would like to raise it now even though it does not fall into any of the assigned task topics.
>>> Here is the problem: the RDF specs define an RDF graph to be a **set** of triples. But a set is a pure-mathematical Platonic abstraction, like a number or an Abelian group. It is not a data structure or a document or a text, and it cannot be transmitted by HTTP or FTP or any other XXTP. And, worst of all, it has no state, so it can't be 'changed'. It simply does not make sense, given the definitions in the current RDF spec, to speak of a 'temporal graph' or of a graph being 'changed'. If i 'add' an item to a set - say, add C to {A, B} to get {A, B, C} -  I have not "changed" anything: I simply have a new set, different from the previous one: {A, B} =/= {A, B, C}. Sets belong in the world of mathematics, not the world of computing. What we need is the notion of a 'graph token' (or some other terminology: see below for more on terminology), meaning an actual representation of an RDF graph. This  would be an information resource, a thing with representations that can be cop
ied and sent from place to place using a TP. Put another way, this would have the same kind of relationship to an RDF graph that a particular copy of Moby Dick has to the literary work with the same name, or that a particular token of the letter 'A' has to the first letter of the English alphabet; and just as with these cases, there can be many tokens of the same RDF graph. I might have my copy and you might have yours: same graph, different tokens. And we can make our own rules for token identity, so it can make perfect sense for tokens to have a state, and a single token to be a token of different RDF graphs as changes are made to it, which is what we are actually currently talking about when we use the impossible terminology of "c
>> hanges to a graph". To emphasize: we already have these things. Every RDF/XML document on the Web, every piece of RDFa, is actually a graph token rather than an actual RDF graph (where I am using "RDF graph" here strictly according to the RDF specs, of course.) You cannot put an actual RDF graph into a digital memory, any more than you can put the number three into one. You have to use a numeral to represent a number, and you have to use a graph document or a graph data structure or some such token-like thing to represent the RDF graph. The issue is only that the RDF specs currently don't acknowledge this simple fact: they represent a kind of idealized fiction that refuses to acknowledge the distinction between a work and a book, or between a number and a numeral, or between a graph and an encoding of it. If one reads the various specs which mention RDF, they vary in their use of the term "RDF graph". Some of them use it mean a graph token, others to mean the Platonic abs
trac
>> tion; and still others seem to be kind of muddled. We saw some of this muddle in the IRC log of today's telecon, in fact. 
>>> In the ISWC talk I invented a completely new terminology of a "surface" (think of a piece of paper on which the graph is conceptually 'drawn'), which I rather like, but we don't have to go that far. In fact, I would propose that we keep the terminology "RDF graph" to refer to the tokens (which is already now a common usage) but alter the specs so that the current RDF graph - the set of triples - is called something like an "RDF abstract graph". Then an RDF graph is not a set, but rather something like an RDF "resource" (In the REST sense), ie an entity which emits a representation of an RDF abstract graph when poked. This allows an RDF graph to have a state that can change, and it brings the whole business of naming a graph with a URI into line with all other kinds of Web naming and identification. This is in fact the way the world actually is, of course: the change is simply bringing the terminology into line with actual practice. The nice thing is, if we do modify the 
specs (actually, the RDF conceptual model) to be more realistic in this way, by making an explicit distinction between the abstract graph and a particular graph token (think of a document), then several things get simpler and some "issues" go away. 1. An RDF graph (new sense, ie a graph token) is now something that can have an identity over time (corresponding to continuing to be identified by a cool IRI, in the usual Web-sanctioned way), so this whole way of talking now makes sense.
>>> 2.  It is quite sensible to have two RDF graphs (tokens) with different names which are the same RDF (abstract) graph. That is, two graph tokens which look like (i.e., when poked emit representations of) the same RDF abstract graph. This has always been an issue for the idea of 'named graphs': how can a name be attached to a particular RDF **abstract** graph (as opposed to some document or representation of that abstract graph)? And OK, the answer is: it can't, and this does not matter, because all we are ever needing to identify are graph tokens, not abstract graphs. You name a graph by identifying a token of it. But that only gives you power over the token, not over the abstraction itself. 3. There is now a very nice way to handle blank nodes: we simply stipulate, as a part of the underlying RDF conceptual model, that every blank node can occur in at most one RDF graph token. Blank nodes are unique to tokens.  Intuitively, we think of the blank nodes in any graph token
 as belonging to the token itself rather than to the abstract graph. This at one stroke fixes the 'scope' of blank node identifiers in any RDF surface syntax or notation (whatever is the boundary of the RDF graph token according to the rules of that syntax, that is where the existential quantifiers are that bind the bnode identifiers) and it also eliminates the need to define 'graph merging' as opposed to 'graph unioning' in the specifications. (If you don't follow this, just believe me. It makes the specs a lot easier and quite a bit shorter.) Anyway, I offer this as an item for the WG to consider. I don't have any particular brief for the choice of terminology, but I do think it is important for us to agree on the basic conceptual distinction (between the 'abstract' idea of an RDF graph as a mathematical set, and some more concrete notion of a graph as a data object with a state that can be identified by a URL) and agree to use it ourselves. If we do, then this will impact 
at least the language that we use in the Graphs TF, and perhaps the way that we actually think about the issues.
>>> I hereby volunteer to write drafts, as necessary, of the relevant changes to the RDF Concepts and RDF Semantics documents to accommodate the necessary changes, if we decide to make them. I think that, with care, no changes would be needed to the SPARQL draft documents.
>>> Pat
>>> ------------------------------------------------------------
>>> IHMC                                     (850)434 8903 or (650)494 3973   40 South Alcaniz St.           (850)202 4416   office
>>> Pensacola                            (850)202 4440   fax
>>> FL 32502                              (850)291 0667   mobile
>>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 
> 
> 
>
Received on Thursday, 24 February 2011 04:09:38 UTC