graph names as third argument

Dan Brinkley posted a great "Dilbert" example involving people and cubicles, which got me thinking (just as I was falling apart with pheumonia/bronchitus). In that example, the neatest and simplest way to record the information in RDF was to have an RDF graph for each time-period and describe the people/cubicle assignments in that graph using a single property to relate them. Then when things changed, one got a new graph. The names of these graphs can then be used to keep track of the times when the arrangements described by the graph actually held true. The whole thing fits very nicely into the description of an RDF store: a bunch of related graphs, each with a name. And it fits even more nicely into a quad store, since the fourth quad field encodes the time information (maybe not directly, but somehow.) 

Dan's challenge was to do this in RDF "correctly", and I have to admit, doing it that way is a crock. You have to introduce 'events' (or 'facts' or 'states' or 'cubicle-occupancies') which have three properties: when they hold, who is their person and what is their cubicle. Its as ugly as all hell, and the graph names are just wasted, and provide no useful information. Hmmm. So, why is the 'bad' way so much easier and mor elegant than the "right" way? Because we start with a static 2-place relation (person located in cubicle) then adding time makes it into a THREE place relation, and RDF isnt set up to handle three-place relations naturally. But a quad store is! A quad store is *exactly* what you need to store data using a three-place relation, in the most natural and efficient way anyone could possibly want. So, forget about RDF: if I were faced with this kind of a problem to encode, and you gave me a quad store to do it with, that's how I would do it as well, and if that required me to lie, steal and cheat with the RDF/SPARQL specifications, well then that is the cost of doing business. (I might even be inclined to try to argue things like that graph names in an RDF graph store needn't be names of graphs, or that I can use names in a context in any way I want...)

So, OK, this usage is so extremely natural that we ought to find a way to make it correct, rather than keep insisting that honest folk tie themselves in knots. So here are some thoughts along those lines. 


Some properties can be stated to be 'anchored' to a graph identifier. What this means is that when a triple with that property occurs in a triple in an identified graph, the graph identifier itself is treated as a fourth argument of the triple. (So in this case, semantically, the triple is a quad and the property is a three-place relation rather than a binary relation.) That identifier need not be said to denote the graph, notice: typically, in fact, it won't, eg in DanB's scenario, it might denote a time-interval. Still it can be graph *identifier* in an RDF store. (OK, so we might need to get contexts or punning into the picture to handle this, see previous email. But leave that aside for now.) 

This immediately makes sense of the Dilbert kind of data pattern, in the most direct way possible. This is a big plus, to me. It has some other consequences, which resonate with thngs Richard has been saying. 

1. Triples which are anchored in a graph can only be merged or imported into another graph which has the same identifer.
2. Two graphs with different identifiers which contain anchored triples cannot be merged.
3. If we (as we must) use RDF syntax to declare the anchoring, eg by a triple like 
:prop rdf:anchored :graphIdentifier .
or maybe 
:prop rdf:type :AnchoredProperty .
then it is invalid to delete such a triple from an identified graph in which it occurs. This is a nonmonotonic inference, but I think it is tolerable as it would be "local" to an identified graph. 

In other words, this makes identified graphs into a kind of "local context', but by adding an argument to a property rather than by re-interpreting IRIs.

Note, it would be fine to use an anchored IRI as a normal un-anchored property in an un-named graph: there it would simply be a normal property with two arguments. The meaning would be exactly like having a named graph named with a blank node (I am not suggesting we allow this, but that is what the effect would be.) So for example in the Dilbert case, the use of :located in a plain unidentified graph would mean something like "at some unknown time, this person was in that cubicle".

Anyway, just wanted to get this on the table. This really is quite simple to do in the semantics, by the way: the necessary machinery is already worked out in the ISO Common Logic standard from 2004. We just have to sort out the syntactic details of how to state the anchoring using RDF. 

I think this will be hugely popular, if we do it.


PS. Sorry no emails from me until the 19th November. 

IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile

Received on Wednesday, 2 November 2011 17:52:18 UTC