- From: Sampo Syreeni <decoy@iki.fi>
- Date: Thu, 27 Aug 2009 01:54:23 +0300 (EEST)
- To: "Seaborne, Andy" <andy.seaborne@hp.com>
- cc: Niklas Lindström <lindstream@gmail.com>, Semantic Web <semantic-web@w3.org>
On 2009-08-26, Seaborne, Andy wrote: > What is happening is that to do the ORDER BY, it has to retrieve all > the 342684 possibilities so the ORDER BY affacts the pattern matching > part and incurs the sort cost. Sort of, but not quite. Instead it could work backwards beginning with the ordered timestamps, and go from there. So, as a relational kinda guy, to me this seems like an optimization problem. One that I'd easily solve under an RDBMS, and one that would necessitate e2e optimization across the RDF layer in the case of triple stores. The way to go with a relational database would be to have a partitioned-by-type relation of instances indexed on time (or a composite index on type major, time minor). The execution plan would translate into a lookup only covering the partitition with this specific type of object, and a linear index scan in temporal order, interrupted after the first 100 tuples. Whatever else you wanted would then probably be translated into a nested-pipelined join towards relation data proper. In theory a columnar organization by attribute, with an index on the table holding the time, should optimize reasonably well across various RDF layers as well. And as per Stonebraker's comments, doubly so if the underlying database was column oriented to start with. -- Sampo Syreeni, aka decoy - decoy@iki.fi, http://decoy.iki.fi/front +358-50-5756111, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
Received on Wednesday, 26 August 2009 23:41:16 UTC