W3C home > Mailing lists > Public > public-rdf-wg@w3.org > February 2012

Another try.

From: Pat Hayes <phayes@ihmc.us>
Date: Tue, 21 Feb 2012 01:56:54 -0600
Message-Id: <6E3CFD8D-3E32-4766-8082-70FFD921BFBB@ihmc.us>
To: RDF WG <public-rdf-wg@w3.org>
SInce my sketch of the quad-semantics proposal was widely misunderstood, let me try again. I will try to present the idea in a way which emphasises how very small a change it is to the current RDF model. 


Currently, the only semantically meaningful RDF construct is the RDF graph, comprised of RDF triples. However, the RDF world now contains other structures, most notably RDF quad stores and SPARQL datasets. The proposal is to extend the RDF semantics in a way which allows us to treat a quad store  (or possibly a suitable abstraction of a quad store) similarly to the way that we now treat an RDF graph. It does not change RDF graphs or their interpretations in any way, and preserves all satisfaction and entailment relationships between RDF graphs exactly as they are at present. 

A quad store is one way to implement a dataset, of course, but I don't want to identify them. A dataset can be viewed as a quad store with a particular interpretation of the fourth quad field, where it is interpreted as a graph label which identifies the RDF graph consisting of the triples formed by the first three components of each quad in the store which have that label in the fourth position. (I am assuming here that the unnamed default graph, if present, has a special 'label', a complication I will now ignore.) 

We have already had a lot of debate and discussion about whether, and how, to extend the RDF semantics to datasets, and have I think reached a rather uncomfortable compromise in which the graph labels are what one might call functional names for, but not actual semantic names for - that is, they do not actually denote - the RDF graph they are associated with in the dataset, and that these are indeed RDF graphs, to which the RDF semantics applies. This means in turn that the meaning of any triple in these various graphs is fixed by the global semantics and does not vary with the graph in which it occurs. That is, the current RDF semantics does not permit the graph containing a triple to act as a kind of 'context' which can be used to affect the truth of the triple.  Several people in the WG have noted that thjey, or various RDF users, would greatly prefer a situation in which the meanings of URIs in triples in these graphs could be contextualized by the graph it happens to be in, so that this containing graph could be considered to be another parameter (along with the subject, property and object in the triple) which plays some kind of role in determining the truth of the triple. This need is so strong that it has even been used as an argument against the viability or applicability of the current RDF semantic model. Examples of this kind of use include using the graph label to encode a time interval during which the triples are true, and using it to indicate a source of information from which the triples are derived. I'm sure there others I've forgotten.

Now, treating a triple in a graph as being in a 'context' determined by the graph itself, and this graph having a label, add up to exactly the same thing, speaking purely semantically, as saying that this triple is actually a quadruple - a relation with three arguments rather than two arguments - whose extra argument is the graph label. Note, I am not saying that the graph is the third argument, only that this IRI which we call a "graph label" when we are thinking of the quad store as implementing a SPARQL dataset, is now being treated as an extra argument of the property in the triple. So that 

G:  { S P O . }

can be thought of as simply being 

P(S, O, G) 

rather then as 

P(S, O) in G. 

This has several immediate consequences, all of them to my mind rather satisfactory. First, we don't have to strain to find a way to say that G is a 'name' of a graph in spite of it perhaps not denoting the graph. Its just an argument to a relation, and it means whatever our ontology for the property P thinks it ought to mean. It can refer to people or anything else, no bother. Second, a quad store (or, a SPARQL dataset re-conceptualized in this way) is much more homogenous and more analogous to an RDF graph. In fact we can think of it as a natural kind of extension of an RDF graph: call it a quad-graph, or something like that. It bears exactly the same relationship to quads as an RDF graph does to triples, all the graph definitions extend naturally to it (merging, instances, etc.,) and in fact we could even mix quads and triples, and the semantics would not mind at all. (In case anyone needs smelling salts at this point, we could also not do this, I have no axe to grind either way, I was just emphasising that there is a 'natural' simple generalization available if we want to use it.) Third, for logical types, this is an *extremely* natural way to extend RDF expressiveness, so natural indeed that logicians have been puzzled and frustrated since 2002 wondering why in God's name RDF was not defined this way in the first place. (See for example papers in the recent issue of J. of Web Semantics, some of which directly propose adding a third 'context' argument to RDF. [1])

For conservatives among us, the opposite re-interpretation is always available. Any quad-graph can be thought of as a SPARQL dataset, by 'slicing' the quads according to their last argument, and re-declaring this parameter to be a graph label. However, to retain the semantic flexibility (ie to have the triples in each graph able to be re-interpreted differently in each labeled graph), we would have to modify the RDF semantics to allow for this graph-local context being involved in the truth recursions. And as already noted, it is simpler, and much less of a change ot the basic RDF model,  to do this by thinking of this construction in the quad-graph way as being a set of property-with-three-argument quads rather than as a collection of labelled sets of two-argument triples. And as so many of the 'natural' uses of datasets seem to want to take advantage of the apparent contextual' possibility of the graph label, and this option is only available in a quad-store format in any case, it seems comparatively harmless to attach the needed semantics directly to this quad store format, rather than tinker with the semantics of triples or try to make sense of graph 'names' which do not denote graphs. 

So, with that introduction, here is the proposal. We define a quad to be a quadruple <s p o pa> where <s p o> is an RDF triple and pa, called a parameter, is an IRI (or a literal? I have no objection, but I think we should avoid blank nodes just for pragmatic reasons. There is no semantic problem, but I think it would be a tar-pit.) A quad-graph is a set of quads. (All the graph metatheory (instances, merging, leanness, etc. etc. ) applies directly to quad-graphs with the substitution "quad"//"triple". And we will have to extend the whole container/state/snapshot discussion to quad-graphs just as we did to RDF graphs, of course.) In the semantics, we allow the value of the IEXT mapping to be any set of pairs *or triples* of individuals, and we say that I( <s p o pa> ) = true just when IEXT(I(p)) contains <I(s), I(o), I(pa)>, otherwise false. All kind of obvious, just extending the RDF model to allow two *or three* arguments. And that is all. (Well, not quite all, see below.) 

A few points. First, this makes *no change at all* to current RDF graphs or to any semantic properties of them. Since RDF graphs contain no quads, the new extra truth-condition never applies to them. Second, really the same point, this does not make RDF "contextual" or change its logic in any way. But what it does do is *extend* it so that *the extension* can represent (what one could interpret as) a contextualization of current RDF triples, but they are now called RDF quads. Or at any rate, that is how they are described officially (see preceding paragraph.) And of course they can be implemented using existing quad stores, and in fact many of the ways that quad stores are being used right now fits this model better than the currently 'official' semantic model. 

A few FAQs.  
1. Does extending RDF like this break OWL or RIF or RDFS?  A: No, because the current RDF is not changed at all by this, so all the languages and systems built on top of are still OK on top of it. 
2. But when we change the semantics, isnt this going to require changes to the *semantics* of OWL (etc.) if only to prove that they still work properly with RDF? A: No, because we can provide a simple semantic meta-theorem, which maps the extended (pairs+triples in IEXT) interpretations back to the GOF interpretations (throw away the triples) and says that if your data has no quads in it, then you get the same truthvalues. In fact this is kind of obvious. So even the semantic geeks can ignore this stuff if they want to.
3. But while OWL (etc) are OK now, doesn't this permit new ways to embed OWL (etc) into RDF and so provide opportunities for things to be done differently in some future specification? A: Um, ... yes. Is that a problem? 
4. Does SPARQL still work? A: Oh sure, it still *works*. But we might want to re-think how we think about a SPARQL datastore, differently from how it is currently described, in terms of graphs with labels. (Or not. Read on.)
5. This seems to allow a single property to have both two arguments (in a triple) and three arguments (in a quad). So which is it, a 2-place relation or a 3-place relation? A: it is a variadic relation, in general: it can have two *or* three arguments. The ISO Common Logic semantics works this way. If you prefer, think of it as a kind of 'punning', where the same name can be used for both the binary and the trinary relations. It works out the same. 
6. Will we need to extend RDFS to cover some new things? A: Yes, probably. For example, we might want to have an rdfs:paraRange property which does for the third argument what rdfs:range does for the second. But again, these would all be optional extensions, and not need to be used by existing RDFS/RDF reasoners. 


OK, there is still an issue to be sorted out. Suppose we are given a quad store. Under this proposal, we can think of it in two ways: as an implementation of a SPARQL datastore, with the SPARQL account of that as a collection of labelled RDF graphs and the current RDF semantics (no contexts), *or* in this new way, as a quad-graph (-container-state, maybe: let me leave that issue aside for the present.) If you like, think of this second as a 'contextualized' datastore where the triples in the graphs can have different truthvalues according to the various graphs they occur in. But either way, the point is that these are *different*. They have different logics: it is valid to merge graphs in the first case but not the second (unless they have the same label, of course.) A graph in the first case can be taken out of the datastore, its label deleted, and it still means what it meant when it was labelled. But not so in the second case, in general: the graph label there has become part of the meaning of the graph, and taking it away is liable to change the meaning just as much as  changing an IRI in an RDF graph might change the meaning of the graph. Both cases have their uses, but we need to be able to distinguish them somehow, and this will need some kind of notational extension. 

Im sure there are many ways to do this, but the simplest is just to add a little to the Trig syntax. Suppose we allow the terminal dot of a triple to be written as a plus sign, to mean that this triple is being interpreted as depending on its graph context, ie it is really a quad with the graph name as its contextual parameter. Call this a contextual triple and say that the graph is then a context. Contexts are involved in the truth of the triples they contain, so they are quad-graphs in disguise. Then two graphs can be merged just when (a) neither is a context (Ie they are normal RDF graphs) or (b) they are the same context (ie have the same graph label.)

Here is an example. Suppose we want to archive the states of some graph container called :G which changes daily. Some of the triples in the container are stable, others not: it depends on the property. Then we can have a dataset like this:

:G: { :G01012012 x:ContentsOfOnDay "01012012"^^xsd:date + 
:G01022012 x:ContentsOfOnDay "01022012"^^xsd:date + 
:G01032012 x:ContentsOfOnDay "01032012"^^xsd:date + 
....  }
:G01012012: { :this :that :other +
:this :never :changes .
:G01022012: {:this :that :else +
:this :never: changes .

The intended semantics of the first graph are that ContentsOfOnDay x y z means that the graph x is the contents of the container z on the day y.  Notice that the URI for the graph container is here used as the first graph label, but what it *refers to* is the container, not that graph. The first graph here is *about* the container, one might reasonably say. 


Hope this idea is now reasonably clear.  FWIW, I think that this extension to RDF would be immediately and eagerly seized upon by a large number of users. The need for some kind of 'context' mechanism has been a refrain of RDF critics since 2004. 


[1] http://journalofwebsemantics.blogspot.com/2012/02/jws-special-issue-reasoning-with.html

IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Tuesday, 21 February 2012 07:57:30 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:47 GMT