Re: Efficient context mechanisms from Thomas Russ on 2003-01-15 (www-rdf-interest@w3.org from January 2003)

From: Thomas Russ <tar@ISI.EDU>
Date: Wed, 15 Jan 2003 10:33:25 -0800
To: Dave Reynolds <der@hplb.hpl.hp.com>
Cc: Bob MacGregor <macgregor@ISI.EDU>, www-rdf-interest@w3.org, hans@ISI.EDU
Message-Id: <D57E8402-28B7-11D7-819E-000393B3C49E@isi.edu>
On Wednesday, January 15, 2003, at 09:47  AM, Dave Reynolds wrote:

> Bob,
>
>>> What is the space complexity of the algorithm like?
>>
>> The contexts are organized into a hierarchy, and numbered so
>> that a parent's number is always smaller than the numbers of its
>> descendants.  Some sort of data structure is needed that can quickly
>> determine, given two contexts C1 and C2, whether C1 inherits C2.
>> One could imagine that the space requirements for this would be
>> superlinear in the worst case, but they should be linear (in the 
>> number
>> of contexts) for the average case.
>>
>> Each statement/triple points to a list of contexts where it has
>> appeared within an assertion. If a triple [S P O] is asserted to be 
>> true in
>> contexts C1, C2, and C3, then there is a triple object that points
>> to a list containing C1, C2, and C3 (ordered according to the
>> numbers they are assigned, largest number first).
>
> So you have structure sharing in the sense that there is only one 
> triple object
> which can appear in multiple contexts. In the case where there are 
> lots of
> triples (N) and lots of contexts (M) then the worst case cost is 
> dominated by
> those membership lists which would be worst case O(NxM). However, I 
> imagine the
> lists also structure share, and that the ordering helps with that, so 
> the
> average case will be *much* better than that.

List structure sharing:  Only the inheritance lists on the context 
objects can
structure share.  The lists on the individual assertions need to be 
separate
because they represent places where values were asserted, and need to 
have
the ability to splice in additional values in the middle of the list.

The context inheritance list structure sharing breaks down is in the 
case
of multiple inheritance.  Each context with multiple parents has to 
have a
new list structure because the parent list has to be sorted, and that 
can
cause a tangle.

Worst case:  While it is true that you can get NxM worst case 
performance,
that requires effectively NxM assertions, and either means that there 
is no
hierarchical arrangement of contexts, or else that the truth value of 
the
assertion changes at each step of the hierarchy.  Both of these 
situations
would seem to be rather perverse and IMHO rather unlikely.

The reason for this is that if the truth value of a proposition is the 
same
as in the parent, then there is no need to record it, since the value 
becomes
available by inheritance, thus saving space.  The only place in the 
inheritance
list where entries need to be made is where the value differs from the 
parent.

> Presumably there is also an additional index structure of sort to 
> allow you to
> get the constant-time truth value lookup - that also needs to have 
> good scaling
> properties.

No.  You can't get constant-time truth value lookup.  This is a 
consequence
of the space saving design of not recording redundant information.  
Each truth
value lookup is effectively a search, which is bounded by O(M) in the 
worst case,
namely the case where you are looking for the truth value in the top 
most context
and the proposition has a truth value entry for each context in the 
system.

The only way to get constant time truth value lookup would be to have a 
NxM table.

If you assume locality of reference applies, then it is possible to use 
a caching
scheme (remembering the last context looked up and the value) to get a 
constant
time lookup of that particular value, but this would not be a guarantee 
of performance.
Although we had the code in place for this in Loom, we never activated 
the code.

>
>> If we were using a quad representation, then memory is linear in
>> the number of quads.
>
> Yes ... but presumably the number of quads is NxM not just N. In your 
> case it
> sounds like M could grow quite large which might be a problem.

OK, I'm not quite sure what the quad representation is, but if it is 
just a triple
with the addition of the context list, then the number of quads would 
still be N, but
the last element would be a list, which could be as large as M.

But if the latter case holds, then it means that one of the cases of 
either no
context inheritance, or alternating truth values applies.  If that is 
so, then I
think one can make an information theoretic argument about how you 
can't really
expect to do any better.

>
> Dave
Received on Wednesday, 15 January 2003 13:36:05 UTC