Re: Another alternative to OR

One of the use cases we have been discussing is the property disjunction
case.  When used multiple times in the query, the rewritten form gets long
and the transformation by a human will be error prone.

I advocate making it clear for the application writer.  I strongly prefer
such burden is not put on application writers unnecesarily. Unusual ways to 
express what the application writer intended increases support costs and can 
make some implementations harder.


Consider:

SELECT *
WHERE  (?x dc10:title ?title) OR (?x dc11:title ?title)
        (?x dc10:creator ?author) OR (?x dc11:creator ?author)

Suppose this is a library with its own accession system and the query is
doing the simple thing of accession number to title and author

SELECT ?title ?author
WHERE   (?x lib:accession ?n)
         (?n lib:number "123.456.789")
         { (?x dc10:title ?title) OR (?x dc11:title ?title) }
         { (?x dc10:creator ?author) OR (?x dc11:creator ?author) }

This is not a long query - the extraction parts of queries can be many
fields and I have seen queries a page long.

This is going to turn into, maybe:

SELECT ?title ?author
WHERE
     (?x lib:accession ?n0)  (?n0 lib:number "123.456.789")
     (?x dc10:title ?title)  (?x dc10:creator ?author)
UNION
     (?x lib:accession ?n1)  (?n1 lib:number "123.456.789")
     (?x dc11:title ?title)  (?x dc10:creator ?author)
UNION
     (?x lib:accession ?n2)  (?n2 lib:number "123.456.789")
     (?x dc11:title ?title)  (?x dc10:creator ?author)
UNION
     (?x lib:accession ?n3)  (?n3 lib:number "123.456.789")
     (?x dc11:title ?title)  (?x dc11:creator ?author)

This rewrite is a simple algorithm for a machine but error prone for
application writers.  There is a repeated part each time.

It is a simple matter for an implementation to the rewrites, but not
for the human application writer.

And there are alternatives: if an implmentation spots that ?x is expected
to be defined once or just a few times (e.g. lib:number is an IFP) an
execution form of

    (?x ?p ?title) AND ?p == dc10:title | ?p == dc11:title

might be preferrable.

Recovering the underlying structure can harder to implement.  Finding a
repeated part is harder - compiler optimization algorithms should find this
but it is less trivial than the initial rewrite.

Early implementations or ones aiming for a small footprint may be quite
naive in executing such a query - later or ones targeted at high
performance may wish to adopt different approaches based on query and
data.

   > The query evaluator can simply execute each match in turn, or it may
   > optimise it if it wishes to get more performance.

It is a balance here - the nested form can be turned into a sequence of
SQL expressions if the implementation so desires.  That is not a hard
implementation.  The converse is not true - turning the exapanded form into 
alternative executiuon forms is not easy.

	Andy

Steve Harris wrote:
> Another systatic alternative to the inline disjunctive exressions I
> dislike so much :) is UNION queries, which have been discussed a little,
> as "multiple queries per request", "multiple WHEREs" or something similar.
> 
> So
> 
> 	SELECT ?name
> 	WHERE { (?x foo:name ?name) }
> 	      OR { (?x bar:hasName ?name) } 
> 
> could be writen as:
> 
> 	SELECT ?name
> 	WHERE (?x foo:name ?name)
> 	UNION
> 	WHERE (?x bar:hasName ?name)
> 
> or some other equivalent (you wany want to repeat the SELECT).
> 
> This has the advantage over OR that it doesnt explode the query
> complexity, and doesnt have such complicated interactions with OPTIONAL,
> as every allowed combination of disjunctive expressions has to be spellt
> out. The query evaluator can simply execute each match in turn, or it may
> optimise it if it wishes to get more performance.
> 
> The downside is that it will make the queries more verbose if you have a
> complicated disjunctive expression.
> 
> - Steve
> 

Received on Monday, 11 October 2004 08:45:24 UTC