- From: Andy Seaborne <andy.seaborne@epimorphics.com>
- Date: Tue, 22 Jan 2013 16:36:31 +0000
- To: semantic-web@w3.org
On 22/01/13 00:11, Jeremy J Carroll wrote: > > What triple stores offer any optimization for integer or other numeric > queries, e.g. with the following query > > SELECT ?a > WHERE { ?a eg:prop ?i . > FILTER ( 10 < ?i && ?i < 15 ) } > > > a naive approach would be to find all eg:prop triples and compute the > filter for each one (O(N) complexity); if the store only has say > integers as objects of eg:prop then a Predicate/Object index could > achieve the same result in O(log(N)) by a numeric binary chop on the > Object part of the index. > > I looked at the TDB code and got the impression that the indices do in > fact preserve ordering of at least some types, but could not see FILTERs > being treated in the way indicated here … then I thought I would ask here. > > Jeremy The TDB indexes do preserve ordering, sort of. Numbers (integer, decimals), as well as dates and datetimes, are stored in the index itself if possible. An integer value has to fit in 56 bits (signed). Indication of datatype is preserved. Otherwise, oversized literals are stored in a different table. Doubles aren't optimized currently as it would loose precision (if node id size increases or becomes variable (compression) then they will be). In the shipped indexing configuration there is a POS index, as a B+Tree, so it is ordered sufficiently for this query if restricted to integers and decimals. Your example would be to start at (eg:prop, 10^^integer) -> (eg:prop, 15^^integer) and also look at: (eg:prop, 10^^decimal) -> (eg:prop, 15^^decimal). and it's all index -access only if you assume it's within the 56 bit limit. If you have to cope with very large integers, or decimals with large absolute values and a lot of fraction value, this will miss those. Each leaf block has 100-200 triples worth of id so in the integer case likely to be one block access. If the data is loaded with tdbloader2, the leaf disk blocks are very likely adjacent on disk so likely to be fast access even on rotational disk. The encoding inline into the index gives a very significant increase in performance over needing to go to a node table to get the value for the FILTER test. My understanding is that there going to be observable benefits when there are very large numbers of eg:prop with values outside that range. Andy
Received on Tuesday, 22 January 2013 16:37:01 UTC