Re: Feature request: IN operator from Bob MacGregor on 2009-03-04 (public-rdf-dawg-comments@w3.org from March 2009)

From: Bob MacGregor <bob.macgregor@gmail.com>
Date: Wed, 4 Mar 2009 10:13:21 -0800
To: Simon Gibbs <simon.gibbs@cantorva.com>
Cc: Toby Inkster <tai@g5n.co.uk>, public-rdf-dawg-comments@w3.org
Message-ID: <a0d0f8f70903041013g6c165a9fjdaa3ee99e20fc1fd@mail.gmail.com>

I want to add my vote for this feature, but with a caveat.  I use it a lot
already, because
I'm using a query language more powerful than SPARQL.  The power of the
construct
comes from the ability GENERATE a set of variable bindings from a list, not
from
FILTERING.  SPARQL loses a great deal of expressive power because of the
fact
that FILTER is not supposed to bind variables.  I often find I am forced to
make several
queries and then union the results in program code to overcome that
limitation.

We have a similar situation here.  The optimal query ordering is usually to
first
bind from the IN list, and then use that binding when joining to other
clauses.
A counter argument says that FILTER is not really a filter after all;
instead, a
query planner should be able to take clauses from within FILTER and apply
them in any order.  If that really were the case (that FILTER as a "filter"
is a fiction) then
the following query would work:
    select ?x where { filter (?x = 42) }

But it does not work, and we are all worse off because it does not.  My
concern
is that IN will be hamstrung in the same manner as "=".

Cheers, Bob

On Wed, Mar 4, 2009 at 8:31 AM, Simon Gibbs <simon.gibbs@cantorva.com>wrote:

> I was about to suggest that inference could handle that particular
> example, as far as I can tell the case is handled by sub properties
> (happy to be corrected, but with the variable in another position this
> feature would be desirable for batch operations involving a list of
> subjects or objects of interest.  It is used very frequently in SQL.
>
> Possible use cases would be augmenting, say a few hundred thousand
> records about musicians with information about the genre of music they
> play (something I am likely to do in the near future).
>
> Another use case might involve retrieving additional columns of data in
> a tabular UI, with batch sizes equal to the number of records in the
> viewport and the content of the IN group taken from a column bound to an
> inverse functional property. This would allow extra columns to be
> requested and populated on the fly without performing a fresh query for
> the whole table.
>
> Having an explicit syntax would allow implementors to infer the
> possibility of a longer list than || might suggest and therefore they
> might be able to trigger appropriate optimizations. This might in turn
> yield a higher maximum batch size and better overall performance.
>
> FWIW I would prioritize aggregation functions (MIN, MAX, COUNT, AVERAGE,
> GROUP BY etc) above this feature as the distribution of graphs allows a
> work around for the missing IN (see e.g. Talis' augmentation
> functionality) as does  the || operator.
>
> Toby Inkster wrote:
> > Analogous to the SQL operator of the same name.
> >
> > PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> > SELECT ?thing ?name
> > WHERE {
> >   ?thing ?p ?name .
> >   FILTER (?p IN (foaf:name foaf:nick rdfs:label))
> > }
> >
> > This query can already be written as
> >
> > PREFIX foaf: <http://xmlns.com/foaf/0.1/>
> > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> > SELECT ?thing ?name
> > WHERE {
> >   ?thing ?p ?name .
> >   FILTER (?p = foaf:name || ?p = foaf:nick || ?p = rdfs:label)
> > }
> >
> > But I hope people agree that the former syntax is more legible.
> >
> > Formally, IN would be an infix operator taking a term as its first
> > argument and an rdf:List as its second argument.
> >
> >
>
>
>


-- 
=====================================
Robert MacGregor
bob.macgregor@gmail.com
Mobile: 310-469-2810
=====================================

Received on Wednesday, 4 March 2009 18:13:56 UTC