W3C home > Mailing lists > Public > public-rdf-dawg-comments@w3.org > November 2011

Re: SPARQL 1.1 - LeftJoin definition

From: Eric Prud'hommeaux <eric@w3.org>
Date: Mon, 28 Nov 2011 15:35:40 -0500
To: David McNeil <dmcneil@revelytix.com>
Cc: public-rdf-dawg-comments@w3.org
Message-ID: <20111128203540.GH29120@w3.org>
* David McNeil <dmcneil@revelytix.com> [2011-11-28 11:20-0600]
> Eric - Thanks for the responses, that is helpful. I have another comment
> below.
> 
> On Mon, Nov 28, 2011 at 11:10 AM, Eric Prud'hommeaux <eric@w3.org> wrote:
> 
> > > 4) Given that definition, can't the Filter be pushed down below the
> > > LeftJoin? In this way LeftJoin could be simplified because it would no
> > > longer need to take an expression.
> >
> > If I understand, that would remove compatible rows with failed
> > filters, as opposed to retaining the row on the left side of the
> > optional.
> >
> 
> The way I am thinking of  it since it is a LeftJoin, the compatible rows
> with failed filters would be removed from the right side, but since it is a
> LeftJoin, the rows on the left would be retained (since that is what a
> LeftJoin does).

yep, but the wierd bit is that the filter in the OPTIONAL is scoped to
both the left and right sides of the OPTIONAL:
[[
  SELECT ?label
  {
      {
          {SELECT*{BIND(1 AS?match) BIND(10 AS?x)}}
      } OPTIONAL {
          {SELECT*{BIND("COMPAT PASS" AS?label) BIND(1 AS?match)BIND(10 AS?y)}} UNION
          {SELECT*{BIND("COMPAT FAIL" AS?label) BIND(1 AS?match)BIND(11 AS?y)}} UNION
          {SELECT*{BIND("INCOMPAT"    AS?label) BIND(2 AS?match)BIND(10 AS?y)}}
          FILTER (?x = ?y)
      }
  }
]]


--stem http://ucsc.example/uniProt/ -S mysql://genome@genome-mysql.cse.ucsc.edu/uniProt -e 'PREFIX Ugene: <http://ucsc.example/uniProt/gene#> SELECT ?gene_symbol WHERE { _:gene  <http://ucsc.example/uniProt/gene#acc> "P04637" . _:gene <http://ucsc.example/uniProt/gene#val> ?gene_symbol . }'

In SQL-land, this would require duplicating the left-hand side of the query. In triples:
[[
CREATE TABLE __HOLDS__ (s VARCHAR(40), p VARCHAR(40), o VARCHAR(40));
INSERT INTO  __HOLDS__ (s, p, o) VALUES ("<s1>", "<match>", "1"), ("<s1>", "<x>", "10");
INSERT INTO  __HOLDS__ (s, p, o) VALUES ("<s2a>", "<label>", "\"COMPAT PASS\""), ("<s2a>", "<match>", "1"), ("<s2a>", "<y>", "10");
INSERT INTO  __HOLDS__ (s, p, o) VALUES ("<s2b>", "<label>", "\"COMPAT FAIL\""), ("<s2b>", "<match>", "1"), ("<s2b>", "<y>", "11");
INSERT INTO  __HOLDS__ (s, p, o) VALUES ("<s2c>", "<label>", "\"INCOMPAT\""   ), ("<s2c>", "<match>", "2"), ("<s2c>", "<y>", "10");

CREATE VIEW l AS SELECT a.o AS "?match", b.o AS "?x" FROM __HOLDS__ AS a INNER JOIN __HOLDS__ AS b ON b.s=a.s WHERE a.s="<s1>" AND a.p="<match>" AND b.p="<x>";
+--------+------+
| ?match | ?x   |
+--------+------+
| 1      | 10   |
+--------+------+
]]

I can't create a view which carries that x=y constraint because there is no x in that scope. I can keep it, however, if I promise to restrict it later:
[[
CREATE VIEW r AS SELECT a.o AS "?label", b.o AS "?match", c.o AS "?y" FROM __HOLDS__ AS a INNER JOIN __HOLDS__ AS b ON b.s=a.s INNER JOIN __HOLDS__ AS c ON c.s=a.s WHERE a.p="<label>" AND b.p="<match>" AND c.p="<y>";
+---------------+--------+------+
| ?label        | ?match | ?y   |
+---------------+--------+------+
| "COMPAT PASS" | 1      | 10   |
| "COMPAT FAIL" | 1      | 11   |
| "INCOMPAT"    | 2      | 10   |
+---------------+--------+------+
]]

We can outer join them now, but we've forgotten about our constraint:
[[
SELECT * FROM l LEFT OUTER JOIN r ON l.`?match` = r.`?match`;
+--------+------+---------------+--------+------+
| ?match | ?x   | ?label        | ?match | ?y   |
+--------+------+---------------+--------+------+
| 1      | 10   | "COMPAT PASS" | 1      | 10   |
| 1      | 10   | "COMPAT FAIL" | 1      | 11   |
+--------+------+---------------+--------+------+
]]

[[
CREATE VIEW t AS SELECT l.`?match`, l.`?x`, r.`?label`, r.`?y` FROM l INNER JOIN r ON l.`?match` = r.`?match` WHERE l.`?x`=r.`?y`;
+--------+------+---------------+------+
| ?match | ?x   | ?label        | ?y   |
+--------+------+---------------+------+
| 1      | 10   | "COMPAT PASS" | 10   |
+--------+------+---------------+------+
]]

We can then outer join l against t (which already has l in it):
[[
SELECT * FROM l LEFT OUTER JOIN t ON l.`?match` = t.`?match`;
+--------+------+--------+------+---------------+------+
| ?match | ?x   | ?match | ?x   | ?label        | ?y   |
+--------+------+--------+------+---------------+------+
| 1      | 10   | 1      | 10   | "COMPAT PASS" | 10   |
+--------+------+--------+------+---------------+------+
]]

This looks identical to if we'd
[[
SELECT * FROM l LEFT OUTER JOIN r ON l.`?match` = r.`?match` WHERE l.`?x` = r.`?y`;
+--------+------+---------------+--------+------+
| ?match | ?x   | ?label        | ?match | ?y   |
+--------+------+---------------+--------+------+
| 1      | 10   | "COMPAT PASS" | 1      | 10   |
+--------+------+---------------+--------+------+
]]
, but recall that if l had an incompatible row, say either of the last two of
+--------+------+
| ?match | ?x   |
+--------+------+
| 1      | 10   |
| 1      | 12   |
| 3      | 10   |
+--------+------+
, we'd expect to see them in the left left join.


btw, will you be coming to http://www.w3.org/2011/09/LinkedData/ ?
(want to register?)


> -David

-- 
-ericP
Received on Monday, 28 November 2011 20:36:19 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 28 November 2011 20:36:20 GMT