Graphs and Being and Time from Pat Hayes on 2011-02-23 (public-rdf-wg@w3.org from February 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Wed, 23 Feb 2011 14:13:03 -0600
To: public-rdf-wg@w3.org
Message-Id: <5AE83D03-F381-446A-BE1B-F0C520B8A887@ihmc.us>

I would like the WG to rectify a basic flaw in the RDF conceptual model, or more exactly, how the basic RDF conceptual model is described in the RDF specs. This will bring the specs more in line with the way that RDF is actually used in practice. It will also not change the RDF semantics, though it will simplify their description a little. It is along the lines I suggested in my ISWC invited talk a while ago. As this will have consequences for the way we talk about RDF graphs generally, I would like to raise it now even though it does not fall into any of the assigned task topics.

Here is the problem: the RDF specs define an RDF graph to be a **set** of triples. But a set is a pure-mathematical Platonic abstraction, like a number or an Abelian group. It is not a data structure or a document or a text, and it cannot be transmitted by HTTP or FTP or any other XXTP. And, worst of all, it has no state, so it can't be 'changed'. It simply does not make sense, given the definitions in the current RDF spec, to speak of a 'temporal graph' or of a graph being 'changed'. If i 'add' an item to a set - say, add C to {A, B} to get {A, B, C} - I have not "changed" anything: I simply have a new set, different from the previous one: {A, B} =/= {A, B, C}. Sets belong in the world of mathematics, not the world of computing.

What we need is the notion of a 'graph token' (or some other terminology: see below for more on terminology), meaning an actual representation of an RDF graph. This would be an information resource, a thing with representations that can be copied and sent from place to place using a TP. Put another way, this would have the same kind of relationship to an RDF graph that a particular copy of Moby Dick has to the literary work with the same name, or that a particular token of the letter 'A' has to the first letter of the English alphabet; and just as with these cases, there can be many tokens of the same RDF graph. I might have my copy and you might have yours: same graph, different tokens. And we can make our own rules for token identity, so it can make perfect sense for tokens to have a state, and a single token to be a token of different RDF graphs as changes are made to it, which is what we are actually currently talking about when we use the impossible terminology of "changes to a graph". To emphasize: we already have these things. Every RDF/XML document on the Web, every piece of RDFa, is actually a graph token rather than an actual RDF graph (where I am using "RDF graph" here strictly according to the RDF specs, of course.) You cannot put an actual RDF graph into a digital memory, any more than you can put the number three into one. You have to use a numeral to represent a number, and you have to use a graph document or a graph data structure or some such token-like thing to represent the RDF graph. The issue is only that the RDF specs currently don't acknowledge this simple fact: they represent a kind of idealized fiction that refuses to acknowledge the distinction between a work and a book, or between a number and a numeral, or between a graph and an encoding of it. If one reads the various specs which mention RDF, they vary in their use of the term "RDF graph". Some of them use it mean a graph token, others to mean the Platonic abstraction; and still others seem to be kind of muddled. We saw some of this muddle in the IRC log of today's telecon, in fact.

In the ISWC talk I invented a completely new terminology of a "surface" (think of a piece of paper on which the graph is conceptually 'drawn'), which I rather like, but we don't have to go that far. In fact, I would propose that we keep the terminology "RDF graph" to refer to the tokens (which is already now a common usage) but alter the specs so that the current RDF graph - the set of triples - is called something like an "RDF abstract graph". Then an RDF graph is not a set, but rather something like an RDF "resource" (In the REST sense), ie an entity which emits a representation of an RDF abstract graph when poked. This allows an RDF graph to have a state that can change, and it brings the whole business of naming a graph with a URI into line with all other kinds of Web naming and identification. This is in fact the way the world actually is, of course: the change is simply bringing the terminology into line with actual practice.

The nice thing is, if we do modify the specs (actually, the RDF conceptual model) to be more realistic in this way, by making an explicit distinction between the abstract graph and a particular graph token (think of a document), then several things get simpler and some "issues" go away.

1. An RDF graph (new sense, ie a graph token) is now something that can have an identity over time (corresponding to continuing to be identified by a cool IRI, in the usual Web-sanctioned way), so this whole way of talking now makes sense.

2. It is quite sensible to have two RDF graphs (tokens) with different names which are the same RDF (abstract) graph. That is, two graph tokens which look like (i.e., when poked emit representations of) the same RDF abstract graph. This has always been an issue for the idea of 'named graphs': how can a name be attached to a particular RDF **abstract** graph (as opposed to some document or representation of that abstract graph)? And OK, the answer is: it can't, and this does not matter, because all we are ever needing to identify are graph tokens, not abstract graphs. You name a graph by identifying a token of it. But that only gives you power over the token, not over the abstraction itself.

3. There is now a very nice way to handle blank nodes: we simply stipulate, as a part of the underlying RDF conceptual model, that every blank node can occur in at most one RDF graph token. Blank nodes are unique to tokens. Intuitively, we think of the blank nodes in any graph token as belonging to the token itself rather than to the abstract graph. This at one stroke fixes the 'scope' of blank node identifiers in any RDF surface syntax or notation (whatever is the boundary of the RDF graph token according to the rules of that syntax, that is where the existential quantifiers are that bind the bnode identifiers) and it also eliminates the need to define 'graph merging' as opposed to 'graph unioning' in the specifications. (If you don't follow this, just believe me. It makes the specs a lot easier and quite a bit shorter.)

Anyway, I offer this as an item for the WG to consider. I don't have any particular brief for the choice of terminology, but I do think it is important for us to agree on the basic conceptual distinction (between the 'abstract' idea of an RDF graph as a mathematical set, and some more concrete notion of a graph as a data object with a state that can be identified by a URL) and agree to use it ourselves. If we do, then this will impact at least the language that we use in the Graphs TF, and perhaps the way that we actually think about the issues.

I hereby volunteer to write drafts, as necessary, of the relevant changes to the RDF Concepts and RDF Semantics documents to accommodate the necessary changes, if we decide to make them. I think that, with care, no changes would be needed to the SPARQL draft documents.

Pat

------------------------------------------------------------
IHMC (850)434 8903 or (650)494 3973
40 South Alcaniz St. (850)202 4416 office
Pensacola (850)202 4440 fax
FL 32502 (850)291 0667 mobile
phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes

Received on Wednesday, 23 February 2011 20:13:38 UTC