Re: time and quads and semantics from Pat Hayes on 2012-02-15 (public-rdf-wg@w3.org from February 2012)

From: Pat Hayes <phayes@ihmc.us>
Date: Wed, 15 Feb 2012 14:39:41 -0600
To: Richard Cyganiak <richard@cyganiak.de>
Cc: RDF WG <public-rdf-wg@w3.org>
Message-Id: <E5D23F63-9DE0-4BBA-BD4F-18F43D83507E@ihmc.us>
On Feb 15, 2012, at 12:12 PM, Richard Cyganiak wrote:

> Pat,
> 
> This seems like a useful direction. A couple of questions:
> 
> 1. You say it doesn't matter much *where* to stick the extra parameter. The proposal places them into the members of property extensions. What other options would make sense? I note that placing them somewhere close to the “surface” of the semantics, so that few definitions differ between the “classic” and “parametrized” semantics, is likely to raise less objections.

Possibly, but other choices have their own difficulties. In a full FOL syntax one has about four options (attach the time parameter to: names / atomic sentences / sentences generally as a modality / top-level sentences as an external contextual marker) but in RDF the only real options are to the triples (atomic sentences) or to a graph (rather like the last option). The problem with the graph idea is how to specify where the 'edge' of the graph is, ie what the scope of the time parameter is supposed to be. We could invent a way  (this is what I was setting out to do a couple of weeks ago) but it gets very messy and requires the semantics to appeal to a conceptual graph model that we have never really developed, so the changes to RDF are actually more far-reaching and intrusive. Also, the triple-by-triple approach is more flexible, and seems to be such a natural fit to a quad-store architecture that it seems almost a crime not to take advantage of it. 

> 
> 2. Would it be fair to say that this proposal is designed to address the following two problems? i) The semantics assume that the truth value of a triple depends only on the three constituent RDF terms of the triple. In common RDF usage, the truth value often also depends on some sort of “publication context”. ii) As a consequence, the merge of such RDF graphs is not necessarily a valid operation, despite what the semantics say.

Yes, although I think the problems are more pervasive than just the matter of merging. Hopefully, getting this into the semantics will suggest and support RDF-compliant ways to express the various options and distinctions that people want to use. 

But you are absolutely right that this is a general 'context' mechanism and is not restricted to use only with contextual times. Hopefully various disticntions between kinds of context can be handled by RDF vocabularies, provided we have ways to name a context/graph. (BTW, one of the reasons that I dislike the word "context" is because it is used to label such a wide variety of different things, and suggests that they can all be handled in one uniform way. We make more progress when we provide ways to distinguish different kinds of context – times, geolocations, personal opinions, beliefs, etc.. – and let users treat each one in ways that are tailored to its idiosyncracies.)

> 
> 3. Would it be fair to say that this proposal provides the following three benefits? i) a semantics that is “cleaner” in the sense that it better fits deployed practice; ii) a formal account of entailment relationships not just between RDF graphs but also between RDF datasets; iii) an opportunity for others to define semantic extensions that properly and formally account for time, trust and so on.

Yes, exactly. I couldn't have put it better myself :-)

Pat

> 
> Best,
> Richard
> 
> 
> On 15 Feb 2012, at 06:33, Pat Hayes wrote:
> 
>> OK yall, I promised 2 weeks ago to try to write this down, but it has gotten a little changed since then, so you are getting the newer version. If it seems to be verging off the point, please bear with me for a while. 
>> 
>> The issue that started it was, how to deal with the fact that RDF is a 'timeless' description logic, but people often (want to, and therefore will) use it to describe facts that are transient, changing, labile, etc.., and also that are true *now* but might not have been true a while ago or at some time in the future. There is a basic divergence between descriptions that are seen as timeless or outside of time, like mathematical statements or ontological statements, and statements - usually data - which are understood to be relative to a 'present time' and are therefore, like dairy products, always have a use-by date, even if this is implicit. RDF is semantically designed for the former, but gets used for the latter, and what should we do about this? Possible answers range from (do nothing and to hell with the semantics anyway) to (insist that RDF is timeless and say that time-sensitive data MUST be phrased in some rebarbative way involving blank nodes, in order to preserve the semantic purity) but neither extreme is palatable to everyone. 
>> 
>> Now, other formal logical notations have met this issue many times, and there is a kind of rough consensus about how to deal with it. Basically, if you want to be able to store the notation and use it later, then it *can't* be based on a time-sensitive 'present-tense' kind of a semantic model: it has to be timeless, at bottom. And then to encode present-tense information in a time-free notation, you have to include a time parameter somewhere. You date-stamp the data, or you have a temporal field in your data table, or you have an extra 'situation' argument in your relations, or some such device. It almost doesnt matter where you put this extra parameter, as long as there is a recognized convention for finding it; but one very common idea is, you make it an extra argument of all your time-sensitive relations (and if you are in AI, you call them 'fluents') So what was a simple property becomes a relationship to a time, what was a binary relation between As and Bs becomes a three-place relation between As, Bs and times (or some other parameter related to times in some known way, a complication I will ignore.) 
>> 
>> The snag with doing this in RDF is, of course, that RDF isnt very good at representing three-place relations. In fact, although theoretically simple, it is in practice so awkward that hardly anyone is going to actually do it. You have to introduce things called 'events' or 'holdings' or 'facts' and say that they have a subject and an object and a time, using three triples. Its just like RDF reification, in fact. Blech. 
>> 
>> Now bring quad stores into the picture, and they seem to provide exactly what we need here. A triple :a :R :b . turns into exactly what we need to encode time-sensitive information: a relation with *three* parameters: :a :R :b :t . That "graph label" can be used to separate a triple true at one time from the 'same' triple true at a different time. Perfect!  Except that no, it isn't, because this isn't what the RDF semantics says it means. The current semantics does not have  the *truth* of a triple varying according to what graph it happens to be in: what the triple says depends only on the interpretation of its components, the subject, predicate and object of the triple. Which is where we are currently stuck. 
>> 
>> So, here is a proposal. We extend RDF to allow property extensions to contain triples as well as pairs. That is, we allow an RDF property to be a trinary as well as a binary relationship. (Strictly, we allow it to be a variadic relation which can be binary or trinary, or both.) Notice that this has the current semantics as a special case, but generalizes it a little. And then we allow, under some circumstances – details later – an RDF property to be interpreted as taking three arguments rather than the usual two. Call the extra argument a 'parameter' for want of a better term. Then we can then think of a quad :s :P :o :pa as consisting of a subject, property, object and parameter, in that order; and it is true in I just when <I(:s), I(:o), I(:pa)> is in IEXT(I(:P)), which takes advantage of the new RDF semantics. (Note that this makes sense even when I is a current RDF interpretation, it just always comes out false. The 'trinary' extension allows some quads to be true in an interpretation.) 
>> 
>> Under this semantic regime, then, there are two ways to think about what a quad store is saying. In one of them, it consists of sets of RDF triples with their truth depending upon a *binary* property, and the fourth field is simply a label for all the triples in each graph, AKA a 'graph name'. But the truth of a triple does not depend on the graph it is in: this graph label is just an organizing device with no semantic import. In other words, what we have now. But there is another way to think about a quad store, in which it bears the same relationship to quads as an RDF graph bears to triples: it is simply a conjunction of a lot of atomic facts, but each atom is now a relation applied to three arguments, and each argument has just as much bearing on the truth of the quad as the others do. Seen from this second perspective, the 'triples' view is simply one way to slice the quad store, using the last argument as the organizing parameter. And now it is natural to treat time-stamped data as living in a quad store whose parameter denotes times or time-intervals. Of course, we can also 'see' such a quad store in the first way, treating the time parameter as a graph label, and this might be a natural way to think about it for processing purposes,   but the second view incorporates the time-varying nature of data which is indeed **parameterised** by the time (if you like, by the graph it happens to be in, intuitively) rather than simply being labelled by it. The second view allows data to actually depend upon the time parameter, instead of simply being organized by it. 
>> 
>> If one prefers to think of data as consisting of RDF graphs written in a 'present tense' but then recorded and stored with a time-stamped label, this is a perfectly legitimate and appropriate way to view such a quad store, provided that one bears in mind that the bare triples must not be taken out of context. For example, merging graphs with different graph labels is not valid, under this convention. Merging two quad stores (where 'merge' here is defined exactly as for RDF graphs but with 'triple' replaced by 'quad' throughout) *is* semantically correct, however: in fact, quad stores with the second parametric interpretation 'work' exactly like RDF graphs do with the current semantics, and all the standard definitions (merging, instances, being grounded, being lean, etc..) work in exactly the same way. 
>> 
>> This all works out quite nicely and naturally, but it there is one big issue. If we are given a quad store, how do we know whether to interpret it as consisting of triples with labels, or as consisting of quads with an extra parameter? It is important to be able to make the distinction, since the same quad store could be true in one view but false in the other, in the same interpretation. There are several ways to handle this, and I am working on a couple of ideas right now. Hopefully I will have an example by tomorrow, but any comments so far? 
>> 
>> Pat
>> 
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973   
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 15 February 2012 20:40:19 UTC