Re: triple (quad) storage sizing

On Wed, May 11, 2011 at 5:59 AM, Steve Harris <steve.harris@garlik.com> wrote:
> On 2011-05-10, at 14:45, William Waites wrote:
>> Typically there will be three different indexes for useful
>> permutations of (s,p,o,g) -- (g,s,p,o), (p,s,o,g), (o,p,s,g) for
>> example. Assuming three indexes, a safe estimate is 96 bytes (3x 32)
>> per triple.
> ...
>
> This doesn't follow, e.g. bitmap indexes, and the index structure that 5store uses are a lot more compact than that.
>
> Don't discount the indexing of lexical values for nodes though, for some datasets that can be quite expensive, anything up to 3x the size of the quad index.

Quite true. For instance, data sets with lots of strings (particularly
large strings) can get expensive to store. This can be a much more
important influence than the number of triples (or quads).

In the case of Parliament the lexical indices form the basis of the
quad indexing. So almost all of the work and space is in those
indexes. The quads themselves are just stored flat, and have the
potential to be packed in more tightly with compression.

Speaking of which, some structures are amenable to compression, which
is useful in terms of either space or bandwidth, particularly when
CPUs typically operate at speeds that are orders of magnitude faster
than the other bottlenecks in the system. This should be considered as
well.

Paul Gearon
Revelytix, Inc

Received on Wednesday, 11 May 2011 15:22:02 UTC