- From: Ivan Mikhailov <imikhailov@openlinksw.com>
- Date: Mon, 08 Feb 2010 17:00:00 +0600
- To: Andy Seaborne <andy.seaborne@talis.com>
- Cc: SPARQL Working Group <public-rdf-dawg@w3.org>
> IN is a operator with the same precedence as EQ etc. > > Syntax: > expr IN ( expr1, expr2, ....) > expr NOT IN ( expr1, expr2, ....) We've implemented IN, because it's very convenient to keep SQL happy and not implemented NOT IN syntax simply because I was too lazy. Moreover, our lod.openlinksw.com/sparql regularly gets queries with filters like ((?var=value1) || (?var=value2) ||... || (?var=value300)). The OR operator is boolean in SQL so SQL optimizer would die so I had to recognize such subexpressions and rewrite to IN operator first, so the code for IN operator in SPARQL optimizer even without visible IN syntax in queries. > Semantics: > > Evaluation is equivalent to writing out in long form: > > IN ==> > expr = expr1 || expr = expr2 || ... > > NOT IN ==> > expr != expr1 && expr != expr2 && ... +1 > 9 IN (1, 2, 1/0) is error > 9 NOT IN (1, 2, 1/0) is error I'd be happy if 9 IN (1, 2, 1/0) is error or false, and 9 NOT IN (1, 2, 1/0) is error or true, depending on implementation and/or roll of dice. The reason for implementation-specific behavior is that the optimizer may calculate a constant value of the expression compile-time. Consider FILTER (?v1 IN (1, 2, ?v2/?v3)) in a context such that the optimizer has proven that ?v1 is an IRI and ?v2,?v3 are numbers. It would be nice to replace the whole expression with false and wipe out a whole group pattern. Other case is rewriting of IN into OR of equalities and then rewriting a group pattern with OR filter into UNION of patterns. If the query has LIMIT than the bad branch may stay undetected. The reason for roll of dice is that the compiler may decide to sort the list of variants to replace sequence of comparisons with a binary search (if result set is filtered with IN) or to get better table lookup locality (say, if ?s ?p ?o . FILTER (?p IN (values)) drives a sequence of PSO index lookups). IN should be based on equality. For IN based on SAMETERM, a special function might be introduced. If both scalar subqueries and IN are supported then ?expn IN (SELECT...) should also be supported, of course. Best Regards, Ivan Mikhailov OpenLink Software http://virtuoso.openlinksw.com
Received on Monday, 8 February 2010 11:00:41 UTC