W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > October to December 2004

Re: Another alternative to OR

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 11 Oct 2004 09:44:50 +0100
Message-ID: <416A4802.2020804@hp.com>
To: Steve Harris <S.W.Harris@ecs.soton.ac.uk>
CC: DAWG public list <public-rdf-dawg@w3.org>

One of the use cases we have been discussing is the property disjunction
case.  When used multiple times in the query, the rewritten form gets long
and the transformation by a human will be error prone.

I advocate making it clear for the application writer.  I strongly prefer
such burden is not put on application writers unnecesarily. Unusual ways to 
express what the application writer intended increases support costs and can 
make some implementations harder.


Consider:

SELECT *
WHERE  (?x dc10:title ?title) OR (?x dc11:title ?title)
        (?x dc10:creator ?author) OR (?x dc11:creator ?author)

Suppose this is a library with its own accession system and the query is
doing the simple thing of accession number to title and author

SELECT ?title ?author
WHERE   (?x lib:accession ?n)
         (?n lib:number "123.456.789")
         { (?x dc10:title ?title) OR (?x dc11:title ?title) }
         { (?x dc10:creator ?author) OR (?x dc11:creator ?author) }

This is not a long query - the extraction parts of queries can be many
fields and I have seen queries a page long.

This is going to turn into, maybe:

SELECT ?title ?author
WHERE
     (?x lib:accession ?n0)  (?n0 lib:number "123.456.789")
     (?x dc10:title ?title)  (?x dc10:creator ?author)
UNION
     (?x lib:accession ?n1)  (?n1 lib:number "123.456.789")
     (?x dc11:title ?title)  (?x dc10:creator ?author)
UNION
     (?x lib:accession ?n2)  (?n2 lib:number "123.456.789")
     (?x dc11:title ?title)  (?x dc10:creator ?author)
UNION
     (?x lib:accession ?n3)  (?n3 lib:number "123.456.789")
     (?x dc11:title ?title)  (?x dc11:creator ?author)

This rewrite is a simple algorithm for a machine but error prone for
application writers.  There is a repeated part each time.

It is a simple matter for an implementation to the rewrites, but not
for the human application writer.

And there are alternatives: if an implmentation spots that ?x is expected
to be defined once or just a few times (e.g. lib:number is an IFP) an
execution form of

    (?x ?p ?title) AND ?p == dc10:title | ?p == dc11:title

might be preferrable.

Recovering the underlying structure can harder to implement.  Finding a
repeated part is harder - compiler optimization algorithms should find this
but it is less trivial than the initial rewrite.

Early implementations or ones aiming for a small footprint may be quite
naive in executing such a query - later or ones targeted at high
performance may wish to adopt different approaches based on query and
data.

   > The query evaluator can simply execute each match in turn, or it may
   > optimise it if it wishes to get more performance.

It is a balance here - the nested form can be turned into a sequence of
SQL expressions if the implementation so desires.  That is not a hard
implementation.  The converse is not true - turning the exapanded form into 
alternative executiuon forms is not easy.

	Andy

Steve Harris wrote:
> Another systatic alternative to the inline disjunctive exressions I
> dislike so much :) is UNION queries, which have been discussed a little,
> as "multiple queries per request", "multiple WHEREs" or something similar.
> 
> So
> 
> 	SELECT ?name
> 	WHERE { (?x foo:name ?name) }
> 	      OR { (?x bar:hasName ?name) } 
> 
> could be writen as:
> 
> 	SELECT ?name
> 	WHERE (?x foo:name ?name)
> 	UNION
> 	WHERE (?x bar:hasName ?name)
> 
> or some other equivalent (you wany want to repeat the SELECT).
> 
> This has the advantage over OR that it doesnt explode the query
> complexity, and doesnt have such complicated interactions with OPTIONAL,
> as every allowed combination of disjunctive expressions has to be spellt
> out. The query evaluator can simply execute each match in turn, or it may
> optimise it if it wishes to get more performance.
> 
> The downside is that it will make the queries more verbose if you have a
> complicated disjunctive expression.
> 
> - Steve
> 
Received on Monday, 11 October 2004 08:45:24 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:21 GMT