Re: Performance comparison: direct select or regex filter ? from Richard Newman on 2010-04-29 (public-sparql-dev@w3.org from April to June 2010)

From: Richard Newman <rnewman@franz.com>
Date: Thu, 29 Apr 2010 11:13:13 -0700
To: Paul Gearon <gearon@ieee.org>
Cc: Parsa Ghaffari <parsa.ghaffari@gmail.com>, public-sparql-dev@w3.org
Message-Id: <1F6B24B8-5DB5-409E-8529-D5E748C819EA@franz.com>

> It *is* possible for a clever implementation to see that the filter is
> a very simple operation that can be duplicated with a lookup in an
> index that sorts lexically. So if your store is the right kind, and
> you have a clever optimizer, it can be just as efficient as for (1).
> In reality, I don't believe any optimizers are quite this clever
> (someone, please tell me if I'm wrong here!) so you won't that level
> of performance.

Those queries aren't actually equivalent:

* The regex doesn't anchor to the end of the string, so "John Smith  
Bar" would match the regex but not the direct match
* The regex (in a strict implementation) will only match a simple  
literal; the direct match will match against a lang-tagged literal

... so it's not possible for an implementation to run the latter  
exactly as quickly as the former, if only because it'll produce  
different results!

To your point, though (the reason I started writing this email):  
there's not too much motivation for implementors to try to detect and  
optimize this kind of edge case, because nobody should be writing a  
query in the second form :)

One thing I've experimented with is detecting some kinds of regexes  
and compiling them into free-text index operations. That's likely a  
middle ground.

Received on Thursday, 29 April 2010 18:13:51 UTC