W3C home > Mailing lists > Public > semantic-web@w3.org > May 2011

Re: triple (quad) storage sizing

From: William Waites <ww@styx.org>
Date: Sat, 14 May 2011 12:41:04 +0200
To: Orri Erling <erling@xs4all.nl>
Cc: 'Paul Gearon' <gearon@ieee.org>, 'Steve Harris' <steve.harris@garlik.com>, 'Semantic-Web' <semantic-web@w3.org>
Message-ID: <20110514104104.GB21910@styx.org>
* [2011-05-11 19:31:16 +0200] Orri Erling <erling@xs4all.nl> écrit:

] With Virtuoso, compressing row-wise, we get an average 27 bytes per quad in
] allocated database pages, excluding literals and IRI strings.  The index
] scheme  is PSOG, POGS, OP, SP, GS.  Note that the three last are not
] covering indices but projections of distinct values from of the columns
] concerned, hence smaller.  POGS is bitmap compressed on S.

Thanks Orri for the specific numbers. In fact (veering off-topic for
this thread) the problems that I've experienced with Virtuoso are
more related to (dead)locking behaviour than raw hardware requirements.

] With column-wise  compression, we get between 9.8 bytes per quad (Dbpedia)
] and 6.4 bytes (BSBM or RDF-ized TPCH).  The logical index layout is as with
] the row-wise model but the physical layout is column-wise.   The column
] store is quite operational but is not generally available as yet.

This is the one done with pieces of MonetDB? That's very
impressive. Will this be free software, perhaps with an arrangement
similar to the current Virtuoso?

] These matters are further explained in the paper linked from my blog
] http://virtuoso.openlinksw.com/blog.  The post in question is around Sep
] 2010, about the VLDB Semdata workshop.

Shall read with interest.

Cheers,
-w
-- 
William Waites                <mailto:ww@styx.org>
http://river.styx.org/ww/        <sip:ww@styx.org>
F4B3 39BF E775 CF42 0BAB  3DF0 BE40 A6DF B06F FD45
Received on Saturday, 14 May 2011 10:41:29 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 07:42:27 UTC