W3C home > Mailing lists > Public > public-rdf-dawg-comments@w3.org > January 2005

A reason for dropping seperate AND clauses

From: Phil Dawes <pdawes@users.sourceforge.net>
Date: Fri, 21 Jan 2005 09:02:13 +0000
Message-ID: <16880.50453.986254.380751@gargle.gargle.HOWL>
To: public-rdf-dawg-comments@w3.org

Hi All,

Apologies if you've already discussed / bottomed-out this issue.

It occured to me that one reason why you might not want to have a
seperate 'AND' clause in sparql query language is that it makes it
more cumbersome to hint an efficient search order to a query processor.

One of the more complex bits of writing a query processor is deducing
the most efficient execution order of the query.  Mysql has a feature
called 'straight_join', which causes it to ignore its own optimisation
heuristics and to execute the sql joins in the order they appear in
the query. 

This is powerful because sometimes the query-writer's implicit
knowledge of the data enables more accurate optimisation than the
analysis of the query optimiser engine. This is potentually more
relevant to rdf than to sql databases since there is often less schema
information to give hints about to the internal structure of the data.

N.B. I'm not proposing that a straight_join feature be added to the
sparql language. Just to note that seperation of constraints with the
AND clause makes it more difficult for a query agent to implement a
straight join feature.


An example of when you might want a straight join:

The query writer knows that a substring search for '*foo*' will
massively cut the search space to a couple of records, and thus should
be applied first. Unfortunately the query optimiser isn't sophisticated
enough to realise this, and is attempted to join its (more indexed)
triples before applying the regex filter.

select ?res, ?label
where (?label LIKE "%foo%")
      (?res, rdfs:label, ?label)
      (?res, rdf:type, ?type)

(LIKE is some regex operator - don't know what the appropriate sparql
is for this and I'm not currently on the internet.)

In the above query, The lack of AND section for the regex constraint
means that the query writer can hint the order easily in the query,
whilst telling the query engine to do a straight_join via some
external non-standard parameter.

Hope this makes sense!

Cheers,

Phil
Received on Friday, 21 January 2005 11:15:57 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:14:47 GMT