Re: A reason for dropping seperate AND clauses from Seaborne, Andy on 2005-01-25 (public-rdf-dawg-comments@w3.org from January 2005)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Tue, 25 Jan 2005 18:42:30 +0000
To: Phil Dawes <pdawes@users.sourceforge.net>
CC: public-rdf-dawg-comments@w3.org
Message-ID: <41F69316.5020409@hp.com>
Phil Dawes wrote:
> Hi All,
> 
> Apologies if you've already discussed / bottomed-out this issue.

Comments always welcome!

> It occured to me that one reason why you might not want to have a
> seperate 'AND' clause in sparql query language is that it makes it
> more cumbersome to hint an efficient search order to a query processor.

Two points:

First - a query processor is responsible for an efficient order.  There is 
always a tension between expressivity for the application writer to clearly say 
what they want and an expression of how to do it.

Second - in SPARQL the AND keyword isn't a fixed clause (like, say, RDQL). This 
query has constraints inline:

-----------------
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?nameX ?nameY
WHERE
    (?x rdf:type foaf:Person)
    (?x foaf:name ?nameX) AND ?nameX =~ /smith/i
    (?x foaf:knows ?y)
    (?y foaf:name ?nameY) AND ?nameY =~ /smith/i
-----------------

The regular expression /smith/i is LIKE "smith" - case insensitive substring match.

[[The "AND"'s are strictly necessary.  Parsers can lookahead, or use 
intermediate states and a lookahead of 1, to resolve it.]]

> 
> One of the more complex bits of writing a query processor is deducing
> the most efficient execution order of the query.

Very true!

> Mysql has a feature
> called 'straight_join', which causes it to ignore its own optimisation
> heuristics and to execute the sql joins in the order they appear in
> the query. 
> 
> This is powerful because sometimes the query-writer's implicit
> knowledge of the data enables more accurate optimisation than the
> analysis of the query optimiser engine. This is potentually more
> relevant to rdf than to sql databases since there is often less schema
> information to give hints about to the internal structure of the data.
> 
> N.B. I'm not proposing that a straight_join feature be added to the
> sparql language. Just to note that seperation of constraints with the
> AND clause makes it more difficult for a query agent to implement a
> straight join feature.

I agree that knowledge of the data can dramatically improve query execution. 
Asking (?x rdf:type rdfs:Resource) isn't the most specific of patterns.

I'd have thought that optimization control (and turning off is a control) is a 
matter for the implementation, not of SPARQL.  Controlling that on a part-query 
basis is hard but I can't see there is a small, fixed set of controls that could 
be agreed upon.

> An example of when you might want a straight join:
> 
> The query writer knows that a substring search for '*foo*' will
> massively cut the search space to a couple of records, and thus should
> be applied first. Unfortunately the query optimiser isn't sophisticated
> enough to realise this, and is attempted to join its (more indexed)
> triples before applying the regex filter.
> 
> select ?res, ?label
> where (?label LIKE "%foo%")
>       (?res, rdfs:label, ?label)
>       (?res, rdf:type, ?type)
> 
> (LIKE is some regex operator - don't know what the appropriate sparql
> is for this and I'm not currently on the internet.)

Assuming that the part ''(?label LIKE "%foo%")'' is an expression
   AND ?label LIKE %foo%
then this query because, if executed in that order there are no results.  ?label 
is unbound, only to be bound later.

Currently, query execution must be the same as executing with the variables 
bound where possible.  The document does not yet say this - I need to write in 
the formal version (it effects optionals and constraints).

There isn't a syntax to completely inline constraints into triple patterns e.g.

     (?res rdfs:label %foo%)

You have to introduce a variable and test it.

> 
> In the above query, The lack of AND section for the regex constraint
> means that the query writer can hint the order easily in the query,
> whilst telling the query engine to do a straight_join via some
> external non-standard parameter.
> 
> Hope this makes sense!

Yes - makes sense.

> 
> Cheers,
> 
> Phil

	Andy
Received on Tuesday, 25 January 2005 18:49:13 UTC