Re: triple (quad) storage sizing from William Waites on 2011-05-10 (semantic-web@w3.org from May 2011)

From: William Waites <ww@styx.org>
Date: Tue, 10 May 2011 21:46:56 +0200
To: Gregory Williams <greg@evilfunhouse.com>
Cc: Semantic Web <semantic-web@w3.org>
Message-ID: <20110510194656.GN31006@styx.org>

* [2011-05-10 15:30:31 -0400] Gregory Williams <greg@evilfunhouse.com> écrit:

] This will also apply to other index orderings, not just (g,s,p,o). 
] For example, a (p,o,g,s) index can share common (p,o) pairs and store
] lists of (g,s).

Right, hence the 25% - things become triples in the best case from
this optimisation rather than quads.

There are also obvious savings because I think you store what is
effectively a trie of the concatenation from (t_1, t_2, t_3) for each
permutation, which only really matters if the distinct number of t_1
is small, whic his the case if t_1 is p or maybe if it is g in some
sets of data - but here the question is, how much is gained from this
in practice?

Mind that the question here is actually very practical - if one wants
to publish a big dataset and put a SPARQL endpoint in front of it, how
much RAM does one need to buy?

-w
-- 
William Waites                <mailto:ww@styx.org>
http://river.styx.org/ww/        <sip:ww@styx.org>
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45

Received on Tuesday, 10 May 2011 19:47:20 UTC