- From: Seaborne, Andy <andy.seaborne@hp.com>
- Date: Mon, 11 Oct 2004 09:44:50 +0100
- To: Steve Harris <S.W.Harris@ecs.soton.ac.uk>
- CC: DAWG public list <public-rdf-dawg@w3.org>
One of the use cases we have been discussing is the property disjunction
case. When used multiple times in the query, the rewritten form gets long
and the transformation by a human will be error prone.
I advocate making it clear for the application writer. I strongly prefer
such burden is not put on application writers unnecesarily. Unusual ways to
express what the application writer intended increases support costs and can
make some implementations harder.
Consider:
SELECT *
WHERE (?x dc10:title ?title) OR (?x dc11:title ?title)
(?x dc10:creator ?author) OR (?x dc11:creator ?author)
Suppose this is a library with its own accession system and the query is
doing the simple thing of accession number to title and author
SELECT ?title ?author
WHERE (?x lib:accession ?n)
(?n lib:number "123.456.789")
{ (?x dc10:title ?title) OR (?x dc11:title ?title) }
{ (?x dc10:creator ?author) OR (?x dc11:creator ?author) }
This is not a long query - the extraction parts of queries can be many
fields and I have seen queries a page long.
This is going to turn into, maybe:
SELECT ?title ?author
WHERE
(?x lib:accession ?n0) (?n0 lib:number "123.456.789")
(?x dc10:title ?title) (?x dc10:creator ?author)
UNION
(?x lib:accession ?n1) (?n1 lib:number "123.456.789")
(?x dc11:title ?title) (?x dc10:creator ?author)
UNION
(?x lib:accession ?n2) (?n2 lib:number "123.456.789")
(?x dc11:title ?title) (?x dc10:creator ?author)
UNION
(?x lib:accession ?n3) (?n3 lib:number "123.456.789")
(?x dc11:title ?title) (?x dc11:creator ?author)
This rewrite is a simple algorithm for a machine but error prone for
application writers. There is a repeated part each time.
It is a simple matter for an implementation to the rewrites, but not
for the human application writer.
And there are alternatives: if an implmentation spots that ?x is expected
to be defined once or just a few times (e.g. lib:number is an IFP) an
execution form of
(?x ?p ?title) AND ?p == dc10:title | ?p == dc11:title
might be preferrable.
Recovering the underlying structure can harder to implement. Finding a
repeated part is harder - compiler optimization algorithms should find this
but it is less trivial than the initial rewrite.
Early implementations or ones aiming for a small footprint may be quite
naive in executing such a query - later or ones targeted at high
performance may wish to adopt different approaches based on query and
data.
> The query evaluator can simply execute each match in turn, or it may
> optimise it if it wishes to get more performance.
It is a balance here - the nested form can be turned into a sequence of
SQL expressions if the implementation so desires. That is not a hard
implementation. The converse is not true - turning the exapanded form into
alternative executiuon forms is not easy.
Andy
Steve Harris wrote:
> Another systatic alternative to the inline disjunctive exressions I
> dislike so much :) is UNION queries, which have been discussed a little,
> as "multiple queries per request", "multiple WHEREs" or something similar.
>
> So
>
> SELECT ?name
> WHERE { (?x foo:name ?name) }
> OR { (?x bar:hasName ?name) }
>
> could be writen as:
>
> SELECT ?name
> WHERE (?x foo:name ?name)
> UNION
> WHERE (?x bar:hasName ?name)
>
> or some other equivalent (you wany want to repeat the SELECT).
>
> This has the advantage over OR that it doesnt explode the query
> complexity, and doesnt have such complicated interactions with OPTIONAL,
> as every allowed combination of disjunctive expressions has to be spellt
> out. The query evaluator can simply execute each match in turn, or it may
> optimise it if it wishes to get more performance.
>
> The downside is that it will make the queries more verbose if you have a
> complicated disjunctive expression.
>
> - Steve
>
Received on Monday, 11 October 2004 08:45:24 UTC