Re: Brain teaser for non-PK tables from Eric Prud'hommeaux on 2012-05-03 (public-rdb2rdf-wg@w3.org from May 2012)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Thu, 3 May 2012 11:25:32 -0400
To: Richard Cyganiak <richard@cyganiak.de>
Cc: W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
Message-ID: <20120503152530.GA24144@w3.org>
* Richard Cyganiak <richard@cyganiak.de> [2012-05-03 14:41+0100]
> Hi Eric,
> 
> Ok, this works, although the thought of generalizing the mix of in-DB and out-of-DB processing to more complex cases is rather scary.
> 
> Once you have this:
> 
> > (select BORROWER, AMOUNT, count(*) AS ccc from IOU group by BORROWER, AMOUNT) IOU
> 
> 
> why not go one step further and do something like
> 
>   SELECT BORROWER, AMOUNT, NUMBER
>   FROM (select ...) IOU
>     JOIN NUMBERS
>        ON IOU.ccc >= NUMBERS.NUMBER
> 
> where NUMBERS is a one-column table containing the integers 1, 2, 3, …
> 
> Your query compressed the duplicate rows and added a count. This query expands them again and adds an index that numbers the identical rows.
> 
> If we use this instead of IOU in the union query, we get:
> 
> BORROWE     AMOUNT     NUMBER PROPERTY      V1       V2
> ------- ---------- ---------- -------------- ------- ----------
> Charlie  10     1 <IOU#BORROWER> Charlie
> Charlie  10     2 <IOU#BORROWER> Charlie
> Charlie  10     1 <IOU#AMOUNT>       10
> Charlie  10     2 <IOU#AMOUNT>       10
> Alice  10     1
> Charlie  10     1
> Charlie  10     2

presumes you can create tables, but yeah, conceptually easier query. you can even push the symbol generation down:

  SELECT concat('_:', concat(BORROWER, concat(AMOUNT, ords.i))) x, '<IOU#BORROWER>' property, BORROWER v1, NULL v2
         FROM (select BORROWER, AMOUNT, count(*) AS ccc from IOU group by BORROWER, AMOUNT) IOU inner join ords on ords.i <= ccc
         WHERE BORROWER='Charlie'
  UNION ALL
  SELECT concat('_:', concat(BORROWER, concat(AMOUNT, ords.i))) x, '<IOU#AMOUNT>' property, NULL v1, AMOUNT v2
         FROM (select BORROWER, AMOUNT, count(*) AS ccc from IOU group by BORROWER, AMOUNT) IOU inner join ords on ords.i <= ccc
         WHERE BORROWER='Charlie'
  UNION ALL
  SELECT concat('_:', concat(BORROWER, concat(AMOUNT, ords.i))) x, NULL property, NULL v1, NULL v2
         FROM (select BORROWER, AMOUNT, count(*) AS ccc from IOU group by BORROWER, AMOUNT) IOU inner join ords on ords.i <= ccc
         WHERE AMOUNT=10;

X            PROPERTY       V1        V2
------------ -------------- ------- ----------
_:Charlie101 <IOU#BORROWER> Charlie
_:Charlie102 <IOU#BORROWER> Charlie
_:Charlie101 <IOU#AMOUNT>  10
_:Charlie102 <IOU#AMOUNT>  10
_:Charlie102
_:Charlie101
_:Alice101


> Now the blank node identity ?x is a function of the original column values (BORROWER and AMOUNT) and of the index (NUMBER). This way, *all* the processing can be done in SQL, including things like LIMIT and so on.
> 
> So yeah, this is very promising, just needs more work to figure out whether it works as a general solution, whether the NUMBERS table (or some equivalent mechanism) can be done in a reasonable portable way (ideally in standard SQL), and whether it's fast enough to be viable in practice.
> 
> I've asked a question about this on StackOverflow. The answers are fascinating. It made me learn about www.sqlfiddle.com. How could I live without it?
> http://stackoverflow.com/questions/10423767/sql-repeat-a-result-row-multiple-times-and-number-the-rows
> 
> I guess this is good enough for me. I'm willing to believe that a non-materialized implementation of the current DM design is practicable (although requiring quite a bit of extra work just to support a corner case).
> 
> The big remaining question is: How to handle this in R2RML?

Looking for an analog to:
  rr:subjectMap [
        rr:column "ROWID";
        rr:termType rr:BlankNode
     ];
I'd propose:
  rr:subjectMap [
        rr:termType rr:RowBlankNode
     ];

Does that complicate things beyond how much a cardinality requirement necessarily complicates things?


> Best,
> Richard
> 
> 
> On 2 May 2012, at 22:40, Eric Prud'hommeaux wrote:
> 
> > * Richard Cyganiak <richard@cyganiak.de> [2012-04-24 21:15+0100]
> >> So, Eric challenged me to present an example of a query over a direct-mapped PK-less table that I believe cannot be evaluated in standard SQL without materializing the entire table outside of the DB.
> >> 
> >> First let me say that I've puzzled over this non-PK issue for more than a day, trying to come up with some scheme based on cursors or ROWNUM or local variables to make it work, and failed. Now, making a leap from “I couldn't do it in a day” to “It's impossible” is certainly not quite appropriate, but after that experience I felt justified to send an implementation experience report to the WG, stating my belief that the cost of implementing this scheme are not worth the benefits. Hence my proposal to let implementers choose whether they want to implement the lean or non-lean direct mapping.
> >> 
> >> So here we go.
> >> 
> >>          IOU
> >>   BORROWER | AMOUNT
> >>   ---------+-------
> >>   Alice    |     10
> >>   Bob      |      5
> >>   Charlie  |     10
> >>   Charlie  |     10
> >> 
> >> The equivalent non-lean direct mapping graph (minus rdf:type triples):
> >> 
> >>   _:1 <IOU#BORROWER> "Alice".
> >>   _:1 <IOU#AMOUNT> 10.
> >>   _:2 <IOU#BORROWER> "Bob".
> >>   _:2 <IOU#AMOUNT> 5.
> >>   _:3 <IOU#BORROWER> "Charlie".
> >>   _:3 <IOU#AMOUNT> 10.
> >>   _:4 <IOU#BORROWER> "Charlie".
> >>   _:4 <IOU#AMOUNT> 10.
> >> 
> >> Now here's a simple SPARQL query:
> >> 
> >>   SELECT * {
> >>      {
> >>         ?x <IOU#BORROWER> "Charlie".
> >>         ?x ?property ?value.
> >>      } UNION {
> >>         ?x <IOU#AMOUNT> 10.
> >>      }
> >>   }
> >> 
> >> The solution should be:
> >> 
> >>   ?x  | ?property      | ?value
> >>   ----+----------------+----------
> >>   _:3 | <IOU#BORROWER> | "Charlie"
> >>   _:4 | <IOU#BORROWER> | "Charlie"
> >>   _:3 | <IOU#AMOUNT>   | 10
> >>   _:4 | <IOU#AMOUNT>   | 10
> >>   _:1 |                |
> >>   _:3 |                |
> >>   _:4 |                |
> >> 
> >> Can you outline an algorithm that produces this result without materializing the table? (Ordering, the difference between literals/IRIs/bNodes, and the specific labels for the bNodes don't matter.)
> >> 
> >> Bonus points if the algorithm is expressed as an R2RML mapping. We can assume that we already have an algorithm for evaluating any SPARQL query over an R2RML mapping.
> >> 
> >> Here's my non-standard solution using ROWID, which only works on Oracle:
> >> 
> >>  SELECT ROWID x, '<IOU#BORROWER>' property, BORROWER value
> >>         FROM IOU
> >>         WHERE BORROWER='Charlie'
> >>  UNION
> >>  SELECT ROWID x, '<IOU#AMOUNT>' property, AMOUNT value
> >>         FROM IOU
> >>         WHERE BORROWER='Charlie'
> >>  UNION
> >>  SELECT ROWID x, NULL, NULL
> >>         FROM IOU
> >>         WHERE AMOUNT=10
> > 
> > Oracle insisted that I tweak this to work preserve datatype consistency:
> > 
> >  SELECT ROWID x, '<IOU#BORROWER>' property, BORROWER v1, NULL v2
> >         FROM IOU
> >         WHERE BORROWER='Charlie'
> >  UNION
> >  SELECT ROWID x, '<IOU#AMOUNT>' property, NULL v1, AMOUNT v2
> >         FROM IOU
> >         WHERE BORROWER='Charlie'
> >  UNION
> >  SELECT ROWID x, NULL property, NULL v1, NULL v2
> >         FROM IOU
> >         WHERE AMOUNT=10;
> > 
> > X     PROPERTY   V1    V2
> > ------------------ -------------- ------- ----------
> > AAAdX+AAEAAAF0tAAA
> > AAAdX+AAEAAAF0tAAC <IOU#AMOUNT>     10
> > AAAdX+AAEAAAF0tAAC <IOU#BORROWER> Charlie
> > AAAdX+AAEAAAF0tAAC
> > AAAdX+AAEAAAF0tAAD <IOU#AMOUNT>     10
> > AAAdX+AAEAAAF0tAAD <IOU#BORROWER> Charlie
> > AAAdX+AAEAAAF0tAAD
> > 
> > To get rid of ROWID, I selected a counted IOU table with
> > s/IOU
> > /(select BORROWER, AMOUNT, count(*) AS ccc from IOU group by BORROWER, AMOUNT) IOU
> > /g
> > 
> >  SELECT BORROWER, AMOUNT, ccc, '<IOU#BORROWER>' property, BORROWER v1, NULL v2
> >         FROM (select BORROWER, AMOUNT, count(*) AS ccc from IOU group by BORROWER, AMOUNT) IOU
> >         WHERE BORROWER='Charlie'
> >  UNION
> >  SELECT BORROWER, AMOUNT, ccc, '<IOU#AMOUNT>' property, NULL v1, AMOUNT v2
> >         FROM (select BORROWER, AMOUNT, count(*) AS ccc from IOU group by BORROWER, AMOUNT) IOU
> >         WHERE BORROWER='Charlie'
> >  UNION
> >  SELECT BORROWER, AMOUNT, ccc, NULL property, NULL v1, NULL v2
> >         FROM (select BORROWER, AMOUNT, count(*) AS ccc from IOU group by BORROWER, AMOUNT) IOU
> >         WHERE AMOUNT=10;
> > 
> > BORROWE     AMOUNT   CCC PROPERTY      V1       V2
> > ------- ---------- ---------- -------------- ------- ----------
> > Charlie  10     2 <IOU#BORROWER> Charlie
> > Charlie  10     2 <IOU#AMOUNT>       10
> > Alice  10     1
> > Charlie  10     2
> > 
> > Each row emits CCC solutions where x is bound to a function of the
> > first three columns iterating i from 1 to CCC, allocating or re-using
> > a bnode in a hash of lists of bnodes:
> > 
> > Charlie  10     2 <IOU#BORROWER> Charlie
> >  bnodes{x}{Charlie-10}[1] = _:b
> >  results += (?x -> _:b, ?property -> <IOU#BORROWER>, ?value -> "Charlie")
> >  bnodes{x}{Charlie-10}[2] = _:c
> >  results += (?x -> _:c, ?property -> <IOU#BORROWER>, ?value -> "Charlie")
> > Charlie  10     2 <IOU#AMOUNT>       10
> >  results += (?x -> _:b, ?property -> <IOU#AMOUNT>, ?value -> 10)
> >  results += (?x -> _:c, ?property -> <IOU#AMOUNT>, ?value -> 10)
> > Alice  10     1
> >  bnodes{x}{Alice-10}[1] = _:a
> >  results += (x -> _:a)
> > Charlie  10     2
> >  results += (?x -> _:b)
> >  results += (?x -> _:c)
> > 
> > 
> >> Earning the R2RML bonus points:
> >> 
> >>   <#map> a rr:TriplesMap;
> >>      rr:logicalTable [
> >>         rr:sqlQuery "SELECT ROWID, BORROWER, AMOUNT FROM IOU";
> >>      ];
> >>      rr:subjectMap [
> >>         rr:column "ROWID";
> >>         rr:termType rr:BlankNode
> >>      ];
> >>      rr:predicateObjectMap [
> >>         rr:predicate <IOU#BORROWER>;
> >>         rr:objectMap [ rr:column "BORROWER" ];
> >>      ];
> >>      rr:predicateObjectMap [
> >>         rr:predicate <IOU#AMOUNT>;
> >>         rr:objectMap [ rr:column "AMOUNT" ];
> >>      ].
> >> 
> >> Now, how to do this without the ROWID vendor extension???
> >> 
> >> 
> >> ----
> >> 
> >> For the record. With a lean direct mapping, the desired output graph would be:
> >> 
> >>   _:1 <IOU#BORROWER> "Alice".
> >>   _:1 <IOU#AMOUNT> 10.
> >>   _:2 <IOU#BORROWER> "Bob".
> >>   _:2 <IOU#AMOUNT> 5.
> >>   _:3 <IOU#BORROWER> "Charlie".
> >>   _:3 <IOU#AMOUNT> 10.
> >> 
> >> The query result would be:
> >> 
> >>   ?x  | ?property      | ?value
> >>   ----+----------------+----------
> >>   _:3 | <IOU#BORROWER> | "Charlie"
> >>   _:3 | <IOU#AMOUNT>   | 10
> >>   _:1 |                |
> >>   _:3 |                |
> >> 
> >> The standard-compliant SQL query would be as above, but replace ROWID with something like (BORROWER || '@@@separator@@@' || AMOUNT), and add DISTINCT to each SELECT.
> >> 
> >> The R2RML query would be the same as above with the following changes:
> >> 
> >>      rr:logicalTable [
> >>         rr:tableName "IOU";
> >>      ];
> >>      rr:subjectMap [
> >>         rr:template "{BORROWER}@@@separator@@@{AMOUNT}";
> >>         rr:termType rr:BlankNode;
> >>      ];
> >> 
> >> So, implementing the lean direct mapping is not hard using just standard SQL.
> >> 
> >> Best,
> >> Richard
> > 
> > -- 
> > -ericP
> > 
> 

-- 
-ericP
Received on Thursday, 3 May 2012 15:26:10 UTC