- From: Paul Gearon <gearon@ieee.org>
- Date: Wed, 11 May 2011 11:21:35 -0400
- To: Steve Harris <steve.harris@garlik.com>
- Cc: William Waites <ww@styx.org>, Semantic-Web <semantic-web@w3.org>
On Wed, May 11, 2011 at 5:59 AM, Steve Harris <steve.harris@garlik.com> wrote: > On 2011-05-10, at 14:45, William Waites wrote: >> Typically there will be three different indexes for useful >> permutations of (s,p,o,g) -- (g,s,p,o), (p,s,o,g), (o,p,s,g) for >> example. Assuming three indexes, a safe estimate is 96 bytes (3x 32) >> per triple. > ... > > This doesn't follow, e.g. bitmap indexes, and the index structure that 5store uses are a lot more compact than that. > > Don't discount the indexing of lexical values for nodes though, for some datasets that can be quite expensive, anything up to 3x the size of the quad index. Quite true. For instance, data sets with lots of strings (particularly large strings) can get expensive to store. This can be a much more important influence than the number of triples (or quads). In the case of Parliament the lexical indices form the basis of the quad indexing. So almost all of the work and space is in those indexes. The quads themselves are just stored flat, and have the potential to be packed in more tightly with compression. Speaking of which, some structures are amenable to compression, which is useful in terms of either space or bandwidth, particularly when CPUs typically operate at speeds that are orders of magnitude faster than the other bottlenecks in the system. This should be considered as well. Paul Gearon Revelytix, Inc
Received on Wednesday, 11 May 2011 15:22:02 UTC