- From: William Waites <ww@styx.org>
- Date: Tue, 10 May 2011 20:26:35 +0200
- To: Andy Seaborne <andy.seaborne@epimorphics.com>
- Cc: semantic-web@w3.org
* [2011-05-10 17:39:30 +0100] Andy Seaborne <andy.seaborne@epimorphics.com> écrit: ] ] There are compression techniques, or data structures that don't store ] the whole of the quad where there is repetition. For example, for ] (g,s,p,o), some index data structures can store one (g,s) and all the ] (p,o). This might well be done by a index data structure that stores ] common prefixes anyway rather than needing a special data structure for ] RDF quads. So if I understand correctly, this example means, best case where a subject only occurs in exactly one graph, that we get basically the same properties as a triplestore, so a savings of 25%. There are probably diminishing returns when one tries to do that with, e.g. (s,p) and (o) unless many repeating predicates on the same subject are very common (e.g. not the case with most real datasets). So then there are two questions. Theory: can we do better than that? Practice: which triple/quad stores do this and what are the rules of thumb for bytes/statement to factor in when speccing hardware? -w -- William Waites <mailto:ww@styx.org> http://river.styx.org/ww/ <sip:ww@styx.org> F4B3 39BF E775 CF42 0BAB 3DF0 BE40 A6DF B06F FD45
Received on Tuesday, 10 May 2011 18:26:59 UTC