- From: Thomas Russ <tar@ISI.EDU>
- Date: Wed, 15 Jan 2003 10:33:25 -0800
- To: Dave Reynolds <der@hplb.hpl.hp.com>
- Cc: Bob MacGregor <macgregor@ISI.EDU>, www-rdf-interest@w3.org, hans@ISI.EDU
On Wednesday, January 15, 2003, at 09:47 AM, Dave Reynolds wrote: > Bob, > >>> What is the space complexity of the algorithm like? >> >> The contexts are organized into a hierarchy, and numbered so >> that a parent's number is always smaller than the numbers of its >> descendants. Some sort of data structure is needed that can quickly >> determine, given two contexts C1 and C2, whether C1 inherits C2. >> One could imagine that the space requirements for this would be >> superlinear in the worst case, but they should be linear (in the >> number >> of contexts) for the average case. >> >> Each statement/triple points to a list of contexts where it has >> appeared within an assertion. If a triple [S P O] is asserted to be >> true in >> contexts C1, C2, and C3, then there is a triple object that points >> to a list containing C1, C2, and C3 (ordered according to the >> numbers they are assigned, largest number first). > > So you have structure sharing in the sense that there is only one > triple object > which can appear in multiple contexts. In the case where there are > lots of > triples (N) and lots of contexts (M) then the worst case cost is > dominated by > those membership lists which would be worst case O(NxM). However, I > imagine the > lists also structure share, and that the ordering helps with that, so > the > average case will be *much* better than that. List structure sharing: Only the inheritance lists on the context objects can structure share. The lists on the individual assertions need to be separate because they represent places where values were asserted, and need to have the ability to splice in additional values in the middle of the list. The context inheritance list structure sharing breaks down is in the case of multiple inheritance. Each context with multiple parents has to have a new list structure because the parent list has to be sorted, and that can cause a tangle. Worst case: While it is true that you can get NxM worst case performance, that requires effectively NxM assertions, and either means that there is no hierarchical arrangement of contexts, or else that the truth value of the assertion changes at each step of the hierarchy. Both of these situations would seem to be rather perverse and IMHO rather unlikely. The reason for this is that if the truth value of a proposition is the same as in the parent, then there is no need to record it, since the value becomes available by inheritance, thus saving space. The only place in the inheritance list where entries need to be made is where the value differs from the parent. > Presumably there is also an additional index structure of sort to > allow you to > get the constant-time truth value lookup - that also needs to have > good scaling > properties. No. You can't get constant-time truth value lookup. This is a consequence of the space saving design of not recording redundant information. Each truth value lookup is effectively a search, which is bounded by O(M) in the worst case, namely the case where you are looking for the truth value in the top most context and the proposition has a truth value entry for each context in the system. The only way to get constant time truth value lookup would be to have a NxM table. If you assume locality of reference applies, then it is possible to use a caching scheme (remembering the last context looked up and the value) to get a constant time lookup of that particular value, but this would not be a guarantee of performance. Although we had the code in place for this in Loom, we never activated the code. > >> If we were using a quad representation, then memory is linear in >> the number of quads. > > Yes ... but presumably the number of quads is NxM not just N. In your > case it > sounds like M could grow quite large which might be a problem. OK, I'm not quite sure what the quad representation is, but if it is just a triple with the addition of the context list, then the number of quads would still be N, but the last element would be a list, which could be as large as M. But if the latter case holds, then it means that one of the cases of either no context inheritance, or alternating truth values applies. If that is so, then I think one can make an information theoretic argument about how you can't really expect to do any better. > > Dave
Received on Wednesday, 15 January 2003 13:36:05 UTC