W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > October to December 2004

Re: OUTER JOIN and DISJUNCTION

From: Eric Prud'hommeaux <eric@w3.org>
Date: Tue, 16 Nov 2004 01:42:53 -0500
To: Simon Raboczi <raboczi@tucanatech.com>
Cc: public-rdf-dawg@w3.org
Message-ID: <20041116064252.GA13411@w3.org>
On Mon, Nov 15, 2004 at 12:11:49PM -0500, Eric Prud'hommeaux wrote:
> On Tue, Nov 16, 2004 at 12:42:39AM +1000, Simon Raboczi wrote:
> > 
> > 
> > On 15/11/2004, at 20:50, Eric Prud'hommeaux wrote:
> > 
> > >
> > >Related to
> > >  ACTION SimonR: explain how much of disjuction can be done with
> > >  optionals, nested or otherwise.  Point to references, problems with
> > >  LEFT OUTER JOIN in the literature.
> > >
> > >In my experiments, I haven't found DISJUNCTION that I couldn't
> > >implement with OUTER JOINs.
> > 
> > I haven't quite gotten the trick yet, embarrassingly enough.
> 
> It's been my default problem for the last 15 months. I've got a LOT
> of time into it (probably could have been shortcutted with some
> measure of education, though).
> 
> >                                                               Can you 
> > demonstrate for the simplest case of
> > 
> > SELECT ?a ?b WHERE (?a <x:p1> <x:o1>) UNION (?b <x:p2> <x:o2>)
> > 
> > targeting a graph with the following triples
> > 
> > <x:s1> <x:p1> <x:o1> .
> > <x:s2> <x:p2> <x:o2> .

On IRC, Andy proposed:

  2004-11-15T15:08:28Z <AndyS> SELECT a, NULL as b WHERE stmt.p = x:p1 stmt.o = o1  UNION
  2004-11-15T15:08:28Z <AndyS>  SELECT a as NULL, b WHERE stmt.p = x:p2 stmt.o = o2
  2004-11-15T15:08:47Z <AndyS> (with apologies to SQL!)

I tried it out in MySQL 4.0.22 and found a mysql bug:

  SELECT h1.s AS a, NULL as b FROM holds AS h1 WHERE h1.p = "x:p1" AND h1.o = "x:o1" UNION
  SELECT NULL as a, h1.s AS b FROM holds AS h1 WHERE h1.p = "x:p2" AND h1.o = "x:o2";

+------+------+
| a    | b    |
+------+------+
| x:s1 | NULL |
| NULL |      |
+------+------+

But it *does* work in last night's nightly build of mysql 4.1 *.

+------+------+
| a    | b    |
+------+------+
| x:s1 | NULL |
| NULL | x:s2 |
+------+------+

Gosh, that's much simpler than the hoops I was jumping through
re-using table aliases and creating complex scalar constraints.


* Install instructions ('cause they are rather opaque):
cd /usr/local/src/
wget http://downloads.mysql.com/snapshots/mysql-4.1/mysql-4.1.8-nightly-20041115.tar.gz
tar xzf mysql-4.1.8-nightly-20041115.tar.gz
cd mysql-4.1.8-nightly-20041115
./configure
make
make install
LD_LIBRARY_PATH=/usr/local/lib/mysql /usr/local/bin/mysql_install_db
LD_LIBRARY_PATH=/usr/local/lib/mysql /usr/local/bin/mysqld_safe &
LD_LIBRARY_PATH=/usr/local/lib/mysql /usr/local/bin/mysql

I needed write privs to /usr/local/ -R and create for /tmp/mysql.sock
to do this


> > I'd expect the result
> > 
> > +--------+--------+
> > |   ?a   |   ?b   |
> > +--------+--------+
> > | <x:s1> |        |
> > |        | <x:s2> |
> > +--------+--------+
> 
> The OUTER JOIN trick is only required when the paths through the
> disjunction involve different table aliases. In this case, we can
> express both sides of the disjunction with the same table alias (h1
> in the example):
> 
> INSERT INTO holds (s, p, o) VALUES ('x:s1', 'x:p1', 'x:o1');
> INSERT INTO holds (s, p, o) VALUES ('x:s2', 'x:p2', 'x:o2');
> 
> SELECT h1.s AS _L1_a_R1_b,
>        (h1.p="x:p1" AND h1.o="x:o1") AS _L1, 
>        (h1.p="x:p2" AND h1.o="x:o2") AS _R1
>   FROM holds as h1
>  WHERE ((h1.p="x:p1" AND h1.o="x:o1")
>      OR (h1.p="x:p2" AND h1.o="x:o2"))
> 
> +------------+------+------+
> | _L1_a_R1_b | _L1  | _R1  |
> +------------+------+------+
> | x:s1       |    1 |    0 |
> | x:s2       |    0 |    1 |
> +------------+------+------+
> 
> It also depends on how you store your data. If it's in
> predicate-specific tables like:
> 
>   CREATE TABLE p1s (id INTEGER NOT NULL AUTO_INCREMENT, 
>                     s VARCHAR(80), 
>                     p1 VARCHAR(80), 
>                     PRIMARY KEY(id));
>   INSERT INTO p1s (s, p1) VALUES ("x:s1", "x:o1");
>   CREATE TABLE p2s (id INTEGER NOT NULL AUTO_INCREMENT, 
>                     s VARCHAR(80), 
>                     p2 VARCHAR(80), 
>                     PRIMARY KEY(id));
>   INSERT INTO p2s (s, p2) VALUES ("x:s2", "x:o2");
> 
> the alias are distinct so the OUTER JOIN hatchet needs to come out:
> 
>   CREATE TABLE true (truth VARCHAR(20));
>   INSERT INTO true (truth) VALUES ("sure, why not");
> 
>   SELECT p1s1.s AS a, p2s1.s AS b
>     FROM true
>          LEFT OUTER JOIN p1s AS p1s1 ON 1
>          LEFT OUTER JOIN p2s AS p2s1 ON 1
>    WHERE (p1s1.p1="x:o1"
>        OR p2s1.p2="x:o2")
> 
> this is asking more of a simultaneous solutions question and begets:
> +------+------+
> | a    | b    |
> +------+------+
> | x:s1 | x:s2 |
> +------+------+
> 
> Here we get to the cutting edge of my sleep research. I'm going to
> sleep now -- perhaps something will percolate by morning.
> 
> > What's the equivalent SPARQL using [ ] instead of UNION?
> 
> The simple way would be to again reduce the sides of the conjunction
> to a single triple as we did in SQL:
> 
> SELECT ?a ?b WHERE (?s ?p ?o) 
>                AND ((?p=<x:p1> AND ?o=<x:o1>)
>                  OR (?p=<x:p2> AND ?o=<x:o2>))
> 
> If you want a challenge, and add NULLs to the language, maybe you can
> do it was a disjunction. You need a throw-away truth for this.
> Something like
> 
>   moon madeOf greenCheese .
> 
> SELECT ?a ?b WHERE (moon madeOf greenCheese) 
>                    [?a <x:p1> <x:o1>] 
>                    [?b <x:p2> <x:o2>]
>                AND (?a IS NULL OR ?b IS NULL)
> 
> but that assumes that you'll get a match on the astronomical trivia.
> You can be more assured that (?s ?p ?o) will match, and since the
> results are a set, you get no more logical results, but that might
> be a bit pricey to actually execute.
> -- 
> -eric
> 
> office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
>                         Shonan Fujisawa Campus, Keio University,
>                         5322 Endo, Fujisawa, Kanagawa 252-8520
>                         JAPAN
>         +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
> cell:   +1.857.222.5741 (does not work in Asia)
> 
> (eric@w3.org)
> Feel free to forward this message to any list for any purpose other than
> email address distribution.



-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +1.857.222.5741 (does not work in Asia)

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Tuesday, 16 November 2004 06:42:53 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:21 GMT