Re: time and quads and semantics from David Wood on 2012-02-15 (public-rdf-wg@w3.org from February 2012)

From: David Wood <david@3roundstones.com>
Date: Wed, 15 Feb 2012 10:31:18 -0500
To: Pat Hayes <phayes@ihmc.us>
Cc: RDF WG <public-rdf-wg@w3.org>
Message-Id: <E8585E7F-62FD-4755-AEC3-4E51C218517D@3roundstones.com>
Hi Pat,

You made me look up "labile".  Thanks.

Your positioning of both the need for and other solutions to time-sensitive relations is very useful, as was your reminder that RDF usage for time-variant data is fundamentally at odds with the current RDF Semantics.  These are not issues that we can, or should, sweep under the rug if only because that is one of the extreme positions that you identified.

I wonder if we can state at the graph level that a given (named) graph contains data that varies in time.  Other graphs that do not so state would be presumed (as they are now) to be timeless in nature.  Solving that problem only for named graphs might allow us to kill two birds with one stone and leave the rest of RDF alone.  This line of thought is similar in concept, but not necessarily in form, to Antoine's proposal.  In fact, I think it introduces exactly the kind of "holdings" that you referred to, but only for graphs (not for statements).  Is that so?

Regards,
Dave




On Feb 15, 2012, at 01:33, Pat Hayes wrote:

> OK yall, I promised 2 weeks ago to try to write this down, but it has gotten a little changed since then, so you are getting the newer version. If it seems to be verging off the point, please bear with me for a while. 
> 
> The issue that started it was, how to deal with the fact that RDF is a 'timeless' description logic, but people often (want to, and therefore will) use it to describe facts that are transient, changing, labile, etc.., and also that are true *now* but might not have been true a while ago or at some time in the future. There is a basic divergence between descriptions that are seen as timeless or outside of time, like mathematical statements or ontological statements, and statements - usually data - which are understood to be relative to a 'present time' and are therefore, like dairy products, always have a use-by date, even if this is implicit. RDF is semantically designed for the former, but gets used for the latter, and what should we do about this? Possible answers range from (do nothing and to hell with the semantics anyway) to (insist that RDF is timeless and say that time-sensitive data MUST be phrased in some rebarbative way involving blank nodes, in order to preserve the semantic purity) but neither extreme is palatable to everyone. 
> 
> Now, other formal logical notations have met this issue many times, and there is a kind of rough consensus about how to deal with it. Basically, if you want to be able to store the notation and use it later, then it *can't* be based on a time-sensitive 'present-tense' kind of a semantic model: it has to be timeless, at bottom. And then to encode present-tense information in a time-free notation, you have to include a time parameter somewhere. You date-stamp the data, or you have a temporal field in your data table, or you have an extra 'situation' argument in your relations, or some such device. It almost doesnt matter where you put this extra parameter, as long as there is a recognized convention for finding it; but one very common idea is, you make it an extra argument of all your time-sensitive relations (and if you are in AI, you call them 'fluents') So what was a simple property becomes a relationship to a time, what was a binary relation between As and Bs becomes a three-place relation between As, Bs and times (or some other parameter related to times in some known way, a complication I will ignore.) 
> 
> The snag with doing this in RDF is, of course, that RDF isnt very good at representing three-place relations. In fact, although theoretically simple, it is in practice so awkward that hardly anyone is going to actually do it. You have to introduce things called 'events' or 'holdings' or 'facts' and say that they have a subject and an object and a time, using three triples. Its just like RDF reification, in fact. Blech. 
> 
> Now bring quad stores into the picture, and they seem to provide exactly what we need here. A triple :a :R :b . turns into exactly what we need to encode time-sensitive information: a relation with *three* parameters: :a :R :b :t . That "graph label" can be used to separate a triple true at one time from the 'same' triple true at a different time. Perfect!  Except that no, it isn't, because this isn't what the RDF semantics says it means. The current semantics does not have  the *truth* of a triple varying according to what graph it happens to be in: what the triple says depends only on the interpretation of its components, the subject, predicate and object of the triple. Which is where we are currently stuck. 
> 
> So, here is a proposal. We extend RDF to allow property extensions to contain triples as well as pairs. That is, we allow an RDF property to be a trinary as well as a binary relationship. (Strictly, we allow it to be a variadic relation which can be binary or trinary, or both.) Notice that this has the current semantics as a special case, but generalizes it a little. And then we allow, under some circumstances – details later – an RDF property to be interpreted as taking three arguments rather than the usual two. Call the extra argument a 'parameter' for want of a better term. Then we can then think of a quad :s :P :o :pa as consisting of a subject, property, object and parameter, in that order; and it is true in I just when <I(:s), I(:o), I(:pa)> is in IEXT(I(:P)), which takes advantage of the new RDF semantics. (Note that this makes sense even when I is a current RDF interpretation, it just always comes out false. The 'trinary' extension allows some quads to be true in an interpretation.) 
> 
> Under this semantic regime, then, there are two ways to think about what a quad store is saying. In one of them, it consists of sets of RDF triples with their truth depending upon a *binary* property, and the fourth field is simply a label for all the triples in each graph, AKA a 'graph name'. But the truth of a triple does not depend on the graph it is in: this graph label is just an organizing device with no semantic import. In other words, what we have now. But there is another way to think about a quad store, in which it bears the same relationship to quads as an RDF graph bears to triples: it is simply a conjunction of a lot of atomic facts, but each atom is now a relation applied to three arguments, and each argument has just as much bearing on the truth of the quad as the others do. Seen from this second perspective, the 'triples' view is simply one way to slice the quad store, using the last argument as the organizing parameter. And now it is natural to treat time-stamped data as living in a quad store whose parameter denotes times or time-intervals. Of course, we can also 'see' such a quad store in the first way, treating the time parameter as a graph label, and this might be a natural way to think about it for processing purposes,   but the second view incorporates the time-varying nature of data which is indeed **parameterised** by the time (if you like, by the graph it happens to be in, intuitively) rather than simply being labelled by it. The second view allows data to actually depend upon the time parameter, instead of simply being organized by it. 
> 
> If one prefers to think of data as consisting of RDF graphs written in a 'present tense' but then recorded and stored with a time-stamped label, this is a perfectly legitimate and appropriate way to view such a quad store, provided that one bears in mind that the bare triples must not be taken out of context. For example, merging graphs with different graph labels is not valid, under this convention. Merging two quad stores (where 'merge' here is defined exactly as for RDF graphs but with 'triple' replaced by 'quad' throughout) *is* semantically correct, however: in fact, quad stores with the second parametric interpretation 'work' exactly like RDF graphs do with the current semantics, and all the standard definitions (merging, instances, being grounded, being lean, etc..) work in exactly the same way. 
> 
> This all works out quite nicely and naturally, but it there is one big issue. If we are given a quad store, how do we know whether to interpret it as consisting of triples with labels, or as consisting of quads with an extra parameter? It is important to be able to make the distinction, since the same quad store could be true in one view but false in the other, in the same interpretation. There are several ways to handle this, and I am working on a couple of ideas right now. Hopefully I will have an example by tomorrow, but any comments so far? 
> 
> Pat
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 
>
Received on Wednesday, 15 February 2012 15:31:51 UTC