Re: triple (quad) storage sizing from David Booth on 2011-05-10 (semantic-web@w3.org from May 2011)

From: David Booth <david@dbooth.org>
Date: Tue, 10 May 2011 14:15:15 -0400
To: Andy Seaborne <andy.seaborne@epimorphics.com>, William Waites <ww@styx.org>
Cc: semantic-web@w3.org
Message-ID: <1305051315.1950.12370.camel@dbooth-laptop>

On Tue, 2011-05-10 at 17:39 +0100, Andy Seaborne wrote:
> 
> On 10/05/11 14:45, William Waites wrote:
> > I'm looking at requirements for making available some large datasets,
> > and ran the back of the envelope calculation below. . . .
> 
> There are compression techniques, or data structures that don't store 
> the whole of the quad where there is repetition. For example, for 
> (g,s,p,o), some index data structures can store one (g,s) and all the 
> (p,o).  This might well be done by a index data structure that stores 
> common prefixes anyway rather than needing a special data structure for
> RDF quads.

Jim Hendler and others (Medha Atre, Jagannathan Srinivasan, James A.
Hendler) have done some work on more efficient storage for RDF:
http://www.cs.rpi.edu/%7Eatrem/bitmat_techrep.pdf 
I don't know the current status of their work.

-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.

Received on Tuesday, 10 May 2011 19:15:38 UTC