- From: Bob MacGregor <bob.macgregor@gmail.com>
- Date: Wed, 4 Mar 2009 10:13:21 -0800
- To: Simon Gibbs <simon.gibbs@cantorva.com>
- Cc: Toby Inkster <tai@g5n.co.uk>, public-rdf-dawg-comments@w3.org
- Message-ID: <a0d0f8f70903041013g6c165a9fjdaa3ee99e20fc1fd@mail.gmail.com>
I want to add my vote for this feature, but with a caveat. I use it a lot already, because I'm using a query language more powerful than SPARQL. The power of the construct comes from the ability GENERATE a set of variable bindings from a list, not from FILTERING. SPARQL loses a great deal of expressive power because of the fact that FILTER is not supposed to bind variables. I often find I am forced to make several queries and then union the results in program code to overcome that limitation. We have a similar situation here. The optimal query ordering is usually to first bind from the IN list, and then use that binding when joining to other clauses. A counter argument says that FILTER is not really a filter after all; instead, a query planner should be able to take clauses from within FILTER and apply them in any order. If that really were the case (that FILTER as a "filter" is a fiction) then the following query would work: select ?x where { filter (?x = 42) } But it does not work, and we are all worse off because it does not. My concern is that IN will be hamstrung in the same manner as "=". Cheers, Bob On Wed, Mar 4, 2009 at 8:31 AM, Simon Gibbs <simon.gibbs@cantorva.com>wrote: > I was about to suggest that inference could handle that particular > example, as far as I can tell the case is handled by sub properties > (happy to be corrected, but with the variable in another position this > feature would be desirable for batch operations involving a list of > subjects or objects of interest. It is used very frequently in SQL. > > Possible use cases would be augmenting, say a few hundred thousand > records about musicians with information about the genre of music they > play (something I am likely to do in the near future). > > Another use case might involve retrieving additional columns of data in > a tabular UI, with batch sizes equal to the number of records in the > viewport and the content of the IN group taken from a column bound to an > inverse functional property. This would allow extra columns to be > requested and populated on the fly without performing a fresh query for > the whole table. > > Having an explicit syntax would allow implementors to infer the > possibility of a longer list than || might suggest and therefore they > might be able to trigger appropriate optimizations. This might in turn > yield a higher maximum batch size and better overall performance. > > FWIW I would prioritize aggregation functions (MIN, MAX, COUNT, AVERAGE, > GROUP BY etc) above this feature as the distribution of graphs allows a > work around for the missing IN (see e.g. Talis' augmentation > functionality) as does the || operator. > > Toby Inkster wrote: > > Analogous to the SQL operator of the same name. > > > > PREFIX foaf: <http://xmlns.com/foaf/0.1/> > > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> > > SELECT ?thing ?name > > WHERE { > > ?thing ?p ?name . > > FILTER (?p IN (foaf:name foaf:nick rdfs:label)) > > } > > > > This query can already be written as > > > > PREFIX foaf: <http://xmlns.com/foaf/0.1/> > > PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> > > SELECT ?thing ?name > > WHERE { > > ?thing ?p ?name . > > FILTER (?p = foaf:name || ?p = foaf:nick || ?p = rdfs:label) > > } > > > > But I hope people agree that the former syntax is more legible. > > > > Formally, IN would be an infix operator taking a term as its first > > argument and an rdf:List as its second argument. > > > > > > > -- ===================================== Robert MacGregor bob.macgregor@gmail.com Mobile: 310-469-2810 =====================================
Received on Wednesday, 4 March 2009 18:13:56 UTC