W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > October to December 2004

Re: OUTER JOIN and DISJUNCTION

From: Eric Prud'hommeaux <eric@w3.org>
Date: Mon, 15 Nov 2004 12:11:50 -0500
To: Simon Raboczi <raboczi@tucanatech.com>
Cc: public-rdf-dawg@w3.org
Message-ID: <20041115171149.GC13311@w3.org>
On Tue, Nov 16, 2004 at 12:42:39AM +1000, Simon Raboczi wrote:
> 
> 
> On 15/11/2004, at 20:50, Eric Prud'hommeaux wrote:
> 
> >
> >Related to
> >  ACTION SimonR: explain how much of disjuction can be done with
> >  optionals, nested or otherwise.  Point to references, problems with
> >  LEFT OUTER JOIN in the literature.
> >
> >In my experiments, I haven't found DISJUNCTION that I couldn't
> >implement with OUTER JOINs.
> 
> I haven't quite gotten the trick yet, embarrassingly enough.

It's been my default problem for the last 15 months. I've got a LOT
of time into it (probably could have been shortcutted with some
measure of education, though).

>                                                               Can you 
> demonstrate for the simplest case of
> 
> SELECT ?a ?b WHERE (?a <x:p1> <x:o1>) UNION (?b <x:p2> <x:o2>)
> 
> targeting a graph with the following triples
> 
> <x:s1> <x:p1> <x:o1> .
> <x:s2> <x:p2> <x:o2> .
> 
> I'd expect the result
> 
> +--------+--------+
> |   ?a   |   ?b   |
> +--------+--------+
> | <x:s1> |        |
> |        | <x:s2> |
> +--------+--------+

The OUTER JOIN trick is only required when the paths through the
disjunction involve different table aliases. In this case, we can
express both sides of the disjunction with the same table alias (h1
in the example):

INSERT INTO holds (s, p, o) VALUES ('x:s1', 'x:p1', 'x:o1');
INSERT INTO holds (s, p, o) VALUES ('x:s2', 'x:p2', 'x:o2');

SELECT h1.s AS _L1_a_R1_b,
       (h1.p="x:p1" AND h1.o="x:o1") AS _L1, 
       (h1.p="x:p2" AND h1.o="x:o2") AS _R1
  FROM holds as h1
 WHERE ((h1.p="x:p1" AND h1.o="x:o1")
     OR (h1.p="x:p2" AND h1.o="x:o2"))

+------------+------+------+
| _L1_a_R1_b | _L1  | _R1  |
+------------+------+------+
| x:s1       |    1 |    0 |
| x:s2       |    0 |    1 |
+------------+------+------+

It also depends on how you store your data. If it's in
predicate-specific tables like:

  CREATE TABLE p1s (id INTEGER NOT NULL AUTO_INCREMENT, 
                    s VARCHAR(80), 
                    p1 VARCHAR(80), 
                    PRIMARY KEY(id));
  INSERT INTO p1s (s, p1) VALUES ("x:s1", "x:o1");
  CREATE TABLE p2s (id INTEGER NOT NULL AUTO_INCREMENT, 
                    s VARCHAR(80), 
                    p2 VARCHAR(80), 
                    PRIMARY KEY(id));
  INSERT INTO p2s (s, p2) VALUES ("x:s2", "x:o2");

the alias are distinct so the OUTER JOIN hatchet needs to come out:

  CREATE TABLE true (truth VARCHAR(20));
  INSERT INTO true (truth) VALUES ("sure, why not");

  SELECT p1s1.s AS a, p2s1.s AS b
    FROM true
         LEFT OUTER JOIN p1s AS p1s1 ON 1
         LEFT OUTER JOIN p2s AS p2s1 ON 1
   WHERE (p1s1.p1="x:o1"
       OR p2s1.p2="x:o2")

this is asking more of a simultaneous solutions question and begets:
+------+------+
| a    | b    |
+------+------+
| x:s1 | x:s2 |
+------+------+

Here we get to the cutting edge of my sleep research. I'm going to
sleep now -- perhaps something will percolate by morning.

> What's the equivalent SPARQL using [ ] instead of UNION?

The simple way would be to again reduce the sides of the conjunction
to a single triple as we did in SQL:

SELECT ?a ?b WHERE (?s ?p ?o) 
               AND ((?p=<x:p1> AND ?o=<x:o1>)
                 OR (?p=<x:p2> AND ?o=<x:o2>))

If you want a challenge, and add NULLs to the language, maybe you can
do it was a disjunction. You need a throw-away truth for this.
Something like

  moon madeOf greenCheese .

SELECT ?a ?b WHERE (moon madeOf greenCheese) 
                   [?a <x:p1> <x:o1>] 
                   [?b <x:p2> <x:o2>]
               AND (?a IS NULL OR ?b IS NULL)

but that assumes that you'll get a match on the astronomical trivia.
You can be more assured that (?s ?p ?o) will match, and since the
results are a set, you get no more logical results, but that might
be a bit pricey to actually execute.
-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +1.857.222.5741 (does not work in Asia)

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Monday, 15 November 2004 17:12:14 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:21 GMT