W3C home > Mailing lists > Public > public-sparql-dev@w3.org > April to June 2010

Re: Performance comparison: direct select or regex filter ?

From: Paul Gearon <gearon@ieee.org>
Date: Thu, 29 Apr 2010 12:47:44 -0400
Message-ID: <x2za25ac1f1004290947taaf68707l35ec6caabb0b1db1@mail.gmail.com>
To: Parsa Ghaffari <parsa.ghaffari@gmail.com>
Cc: public-sparql-dev@w3.org
On Thu, Apr 29, 2010 at 12:22 PM, Parsa Ghaffari
<parsa.ghaffari@gmail.com> wrote:
> Hi there,
> How different are these queries in terms of performance:
> 1) SELECT * WHERE { ?x foaf:name "John Smith"@en }
>
> 2) SELECT * WHERE { ?x foaf:name ?y. FILTER regex(?y,"^John Smith") }

This will depend on the exact implementation, but 1 will be faster.

Most databases will index by object. This means that (1) will usually
be a simple index lookup. For a standard system with tree-based
indexes, this will have logarithmic complexity (O(log(N)). For
in-memory systems with hashed indexes, this will be constant (O(1)).

(2) will require a lookup (as in (1)), followed by a filter operation.
This will require N operations (linear or O(N)) multiplied by the
complexity of the regex (which can be ignored for complexity since it
is simple, but will have some real cost). The overall complexity is
linear: O(N). This is much worse than for (1).

It *is* possible for a clever implementation to see that the filter is
a very simple operation that can be duplicated with a lookup in an
index that sorts lexically. So if your store is the right kind, and
you have a clever optimizer, it can be just as efficient as for (1).
In reality, I don't believe any optimizers are quite this clever
(someone, please tell me if I'm wrong here!) so you won't that level
of performance.

There are databases that offer alternatives to (2), which let you
specify that you're searching lexically through an index, thereby
doing an equivalent operation to (2), while offering the performance
of (1). However, that is highly implementation dependent, and the
mechanism for saying what you want in the query is not standardized.

Regards,
Paul Gearon
Received on Thursday, 29 April 2010 16:48:18 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:15:50 UTC