- From: Bob MacGregor <macgregor@ISI.EDU>
- Date: Fri, 10 Jan 2003 11:02:15 -0800
- To: Dave Reynolds <der@hplb.hpl.hp.com>
- Cc: www-rdf-interest@w3.org, jena-dev@yahoogroups.com
Dave, I'm switching to a new thread, because issue of contexts is getting confused with the reification issue, and to some extent they are separable. On Friday, January 10, 2003, at 02:57 AM, Dave Reynolds wrote: Bob, Most of your points are probably best addressed by Pat, or others, but there were two side comments that I wanted to respond to. We would be happy to send you the source code for Loom or PowerLoom if you want to see what an efficient (and very clever) context mechanism looks Interesting. Do you have any references to papers on this (other than the source code itself)? I've read some of the papers on the Loom classifier but not seen any on the context system. I'm sorry, but the answer is no, we don't have any write-ups on the Loom context mechanism(s). However, I will give you some highlights here. The core implementations of contexts in PowerLoom and Loom are virtually identical, so for the sequel I'll just use the term Loom. In Loom, contexts reside in a tangled hierarchy. Statements themselves have no assigned truth values. If memory serves, Loom uses a three-valued logic (T, F, and U), and PowerLoom uses a 5-valued logic (adding default T and default F). PowerLoom can also allow numeric/probabilistic values to be associated with statements. The context mechanism doesn't care how many truth values are being manipulated; the machinery is orthogonal to that issue. I will use the term 'statement' to refer to a logical sentence, and 'assertion' to refer to a statement having an assigned truth value. Assertions residing in a context A are inherited by all contexts that are descendants of A. These assertions can be overridden, so if context B inherits an assertion stating that S1 is true, B may contain an assertion that in it (and its descendants) the truth value of S1 is false, or unknown, or whatever. This context mechanism, then, is about as general as you might imagine along certain dimensions. It was invented, as I mentioned earlier, by Austin Tate's group in Edinburgh. They needed it because their planning system, OPLAN, was designed to explore large numbers of possible worlds/models, and they needed a very efficient context switching mechanism. They were so enamored of their context mechanism that they envisioned re-engineering it in silicon. However, I don't think that ever happened. Unfortunately, Austin's group didn't ever publish their context mechanism. That was OK with us, because they gave us their source code, and it was only about a page of C code, easily decipherable. We generalized their algorithm in two ways -- we generalized it to handle multiple inheritance, and we invented a notion of a 'home context' that significantly reduces the space requirements in the average case (the case where a truth value for a statement is asserted in one and only one context). Unfortunately, we never wrote up our algorithm either. Its written in Common Lisp, and its only a couple of pages, but certain aspects of it are counter intuitive. Furthermore, some of the stuff is implemented as a part of the language itself, rather than as a part of the application (I have a third implementation of contexts where the code is pulled back out of the language; its not that much different). I would guess that Tom Russ, Loom's caretaker, would be more than happy to elaborate on the details if you cared to contact him. Now for the good part. The Loom context mechanism is VERY fast. I believe that the average asymptotic time to lookup the truth value of a statement within a context hierarchy is O(n), where n is the average number of truth values per statement. The average number of truth values per statement in a normal application is probably a number between 1 and 3, so that means that context sensitive lookup is basically a constant-time operation, independent of the number of contexts. The caveat is that the algorithm I've just alluded to is a main-memory based algorithm. I would guess that some variant of it can be used when one moves to a persistent storage model, but none of us have tried to work out the details (yet). Its possible that we would hit some kind of snag that would negatively impact the asymptotic performance. When looking at Jena internals, I've been keeping the context machinery in the back of my mind, which is partly why I'm paying attention to things like the 'reifiedOnly' bit. If you ever decide to get serious about contexts, it would probably be worth your while to assign someone the task of deciphering our context algorithm (or Austin's). It may be overkill for RDF, but some of the lessons learned probably apply to simpler context systems. But the fact is, if the "shortcut" reification that now exists in Jena were modified only slightly, we could get significant performance improvements for reasoning with the kinds of examples listed above. I'd like to second Brian's suggestion that you work with us tool developers on this to find a pragmatic solution (perhaps over on jena-dev or jena-devel rather than rdf-interest). If I understand our earlier discussion then it still seems plausible to find a solution. Setting aside the wider semantic issues, and thinking just in implementation terms, then the implementation costs of switching from a "statement" to a "statings" view of reification might be mitigated by optimizing the case where an application only wants to have one "reification" for each statement. Ah yes, statement v. statings. After getting a better understanding of this issue (I hear what they are saying, but still can't believe what I'm hearing), it has occurred to me that I may be able to finesse the issue by pretending that the terms 'statement' and 'stating' are synonymous within my own applications. I would do just what you appear to be suggesting -- mandate that there be only one stating per statement, and then use the stating wherever I really intended to use a 'reified statement' (here I'm using the traditional sense of this phrase, not RDF's misuse of it) . If this turns out to be a viable option, then my interest in the internals of how Jena manages "shortcut" reification (i.e., nested statements) will continue to be large. Note: with the exception of scrutinizing the database storage layout for Jena RDB models, I've confined myself to looking hard at the Jena API but not looking below it (i.e., I haven't looked at the code that implements it). That could put me somewhat at a disadvantage when discussing implementation issues. And, of course, I haven't seen Jena 2.0. Are their threads other than 'jena-dev' where implementations issues are discussed? Cheers, Bob
Received on Friday, 10 January 2003 14:02:20 UTC