Re: time and quads and semantics from Ivan Herman on 2012-02-15 (public-rdf-wg@w3.org from February 2012)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 15 Feb 2012 14:08:03 +0100
To: Pat Hayes <phayes@ihmc.us>
Cc: RDF WG <public-rdf-wg@w3.org>
Message-Id: <83F27D61-16E7-4BD8-B816-AB74C8D34849@w3.org>
Hi Pat,

like Antoine, I have clarification questions... In no particular order

- My understanding is that (also attempting to answer some of Antoine's questions) an RDF graph may be the mixture of triples and of quads. In other words, the domain of the IEXT for a property p is the union of binary and ternary vectors. Because IEXT is a function, that also means that having (s,p,o) and (s,p,o,x) in the same graph would not be allowed, right?

- This is a scheme is for quads in general, and not depending on time. So, coming back to the original problem statement, does it mean that you would use the parameter to store time in form of a literal, and you would trust the 'natural' interpretation for those literals that application could use? It does sound all right, just checking... But that means we would have quadstores with possibly literals on the 4th position, I am not sure current quadstores can handle that.

- The approach could work for named graphs, actually. If I use the parameter as a URI then, as you said, we could have this as the 'semantic-free' model, ie, as you call it, a label. But if we want to model the other approach, which says that the name of the graph is a URI with the GET semantics (if you make HTTP GET on that URI then bla bla bla...), this is still 'outside' of the RDF semantics which does not know about HTTP GET. Ie, *if* we want to get that into the formal semantics, that is still to be done. (I say 'If', because the current discussion using some sort of a typing to signal the differences may mean that the HTTP GET semantics is simply described in English terms and is never part of the formal RDF semantics)

- I see how this model could work describing time for specific triples, using the 4th argument for time. But, as far as I could follow, this whole issue came from questions like: if a URI identifies a graph, eg, in the GET semantics, how do I account for the fact that this pairing of URI with a graph is true today, but will not be true tomorrow? After all, I think our primary concern was and is to say something about named graphs. Or does it mean that we have named quadruple graphs, ie, URI-s assigned to a set of quads? This gets a bit hairy for me, and I am a bit concerned about the practical consequences...

- What would that change in the core semantics mean for all the other layers we have on top of RDF? I obviously think of OWL, but also on RIF. We can of course say that it does not affect them, insofar as OWL deals only with triples and it (probably) simply ignores the 4th parameter. However, such a change may open the floodgates for a future revision of OWL that would try to go beyond that. Which is fine if somebody does it, but it may be disruptive for the community. (This remark obviously comes from the W3C Semantic Web Activity lead, and not from a poor engineer who tries to understand that stuff:-)

Hm. Food for thought...

Thanks

Ivan

On Feb 15, 2012, at 07:33 , Pat Hayes wrote:

> OK yall, I promised 2 weeks ago to try to write this down, but it has gotten a little changed since then, so you are getting the newer version. If it seems to be verging off the point, please bear with me for a while. 
> 
> The issue that started it was, how to deal with the fact that RDF is a 'timeless' description logic, but people often (want to, and therefore will) use it to describe facts that are transient, changing, labile, etc.., and also that are true *now* but might not have been true a while ago or at some time in the future. There is a basic divergence between descriptions that are seen as timeless or outside of time, like mathematical statements or ontological statements, and statements - usually data - which are understood to be relative to a 'present time' and are therefore, like dairy products, always have a use-by date, even if this is implicit. RDF is semantically designed for the former, but gets used for the latter, and what should we do about this? Possible answers range from (do nothing and to hell with the semantics anyway) to (insist that RDF is timeless and say that time-sensitive data MUST be phrased in some rebarbative way involving blank nodes, in order to preserve the semantic purity) but neither extreme is palatable to everyone. 
> 
> Now, other formal logical notations have met this issue many times, and there is a kind of rough consensus about how to deal with it. Basically, if you want to be able to store the notation and use it later, then it *can't* be based on a time-sensitive 'present-tense' kind of a semantic model: it has to be timeless, at bottom. And then to encode present-tense information in a time-free notation, you have to include a time parameter somewhere. You date-stamp the data, or you have a temporal field in your data table, or you have an extra 'situation' argument in your relations, or some such device. It almost doesnt matter where you put this extra parameter, as long as there is a recognized convention for finding it; but one very common idea is, you make it an extra argument of all your time-sensitive relations (and if you are in AI, you call them 'fluents') So what was a simple property becomes a relationship to a time, what was a binary relation between As and Bs becomes a three-place relation between As, Bs and times (or some other parameter related to times in some known way, a complication I will ignore.) 
> 
> The snag with doing this in RDF is, of course, that RDF isnt very good at representing three-place relations. In fact, although theoretically simple, it is in practice so awkward that hardly anyone is going to actually do it. You have to introduce things called 'events' or 'holdings' or 'facts' and say that they have a subject and an object and a time, using three triples. Its just like RDF reification, in fact. Blech. 
> 
> Now bring quad stores into the picture, and they seem to provide exactly what we need here. A triple :a :R :b . turns into exactly what we need to encode time-sensitive information: a relation with *three* parameters: :a :R :b :t . That "graph label" can be used to separate a triple true at one time from the 'same' triple true at a different time. Perfect!  Except that no, it isn't, because this isn't what the RDF semantics says it means. The current semantics does not have  the *truth* of a triple varying according to what graph it happens to be in: what the triple says depends only on the interpretation of its components, the subject, predicate and object of the triple. Which is where we are currently stuck. 
> 
> So, here is a proposal. We extend RDF to allow property extensions to contain triples as well as pairs. That is, we allow an RDF property to be a trinary as well as a binary relationship. (Strictly, we allow it to be a variadic relation which can be binary or trinary, or both.) Notice that this has the current semantics as a special case, but generalizes it a little. And then we allow, under some circumstances – details later – an RDF property to be interpreted as taking three arguments rather than the usual two. Call the extra argument a 'parameter' for want of a better term. Then we can then think of a quad :s :P :o :pa as consisting of a subject, property, object and parameter, in that order; and it is true in I just when <I(:s), I(:o), I(:pa)> is in IEXT(I(:P)), which takes advantage of the new RDF semantics. (Note that this makes sense even when I is a current RDF interpretation, it just always comes out false. The 'trinary' extension allows some quads to be true in an interpretation.) 
> 
> Under this semantic regime, then, there are two ways to think about what a quad store is saying. In one of them, it consists of sets of RDF triples with their truth depending upon a *binary* property, and the fourth field is simply a label for all the triples in each graph, AKA a 'graph name'. But the truth of a triple does not depend on the graph it is in: this graph label is just an organizing device with no semantic import. In other words, what we have now. But there is another way to think about a quad store, in which it bears the same relationship to quads as an RDF graph bears to triples: it is simply a conjunction of a lot of atomic facts, but each atom is now a relation applied to three arguments, and each argument has just as much bearing on the truth of the quad as the others do. Seen from this second perspective, the 'triples' view is simply one way to slice the quad store, using the last argument as the organizing parameter. And now it is natural to treat time-stamped data as living in a quad store whose parameter denotes times or time-intervals. Of course, we can also 'see' such a quad store in the first way, treating the time parameter as a graph label, and this might be a natural way to think about it for processing purposes,   but the second view incorporates the time-varying nature of data which is indeed **parameterised** by the time (if you like, by the graph it happens to be in, intuitively) rather than simply being labelled by it. The second view allows data to actually depend upon the time parameter, instead of simply being organized by it. 
> 
> If one prefers to think of data as consisting of RDF graphs written in a 'present tense' but then recorded and stored with a time-stamped label, this is a perfectly legitimate and appropriate way to view such a quad store, provided that one bears in mind that the bare triples must not be taken out of context. For example, merging graphs with different graph labels is not valid, under this convention. Merging two quad stores (where 'merge' here is defined exactly as for RDF graphs but with 'triple' replaced by 'quad' throughout) *is* semantically correct, however: in fact, quad stores with the second parametric interpretation 'work' exactly like RDF graphs do with the current semantics, and all the standard definitions (merging, instances, being grounded, being lean, etc..) work in exactly the same way. 
> 
> This all works out quite nicely and naturally, but it there is one big issue. If we are given a quad store, how do we know whether to interpret it as consisting of triples with labels, or as consisting of quads with an extra parameter? It is important to be able to make the distinction, since the same quad store could be true in one view but false in the other, in the same interpretation. There are several ways to handle this, and I am working on a couple of ideas right now. Hopefully I will have an example by tomorrow, but any comments so far? 
> 
> Pat
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Wednesday, 15 February 2012 13:05:46 UTC