Re: Brain teaser for non-PK tables from Juan Sequeda on 2012-04-25 (public-rdb2rdf-wg@w3.org from April 2012)

From: Juan Sequeda <juanfederico@gmail.com>
Date: Wed, 25 Apr 2012 10:25:13 +0200
To: Richard Cyganiak <richard@cyganiak.de>
Cc: W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
Message-ID: <CAMVTWDwqG8=dceOyh8C6gpg7BKjzab4LDDqmv0W1Bf=N5-xLRg@mail.gmail.com>
Richard,

After a quick look, I agree with you, IF you want to translate a SPARQL
query into one single SQL query. A possible solution would include
materializing the table that does not have a primary key at runtime, stick
the triples in memory and have a SPARQL to SQL rewriter to actually execute
part of the query on the RDF triples in memory, send the other part to SQL
and then do the join outside of the database.

What caught my attention was: "let implementers choose whether they want to
implement the lean or non-lean direct mapping." I like how you phrased
that. This would imply that there could be two DM: a lean and non-lean.

I would propose to change

"If the table has no primary key, the row node is a fresh blank node that
is unique to this row"

to

"If the table has no primary key, the row node is a blank node. "

And then have a note/warning.

[[

If you generate a fresh blank node that is unique to this row, then the
result is a non-lean RDF graph.

If you generate the same blank node for repeated tuples, then the result is
a lean RDF graph.

The non-lean DM preserves the cardinality of the tuples, but it
hard/inefficient to implement in a SPARQL to SQL translator.

The lean DM does not preserve the cardinality of the tuples, but the
implementation is easier/efficient in a SPARQL to SQL translator.

If you are implementing a dumping tool, the recommendation is to create a
non-lean DM in order to maintain the cardinality.

]]


Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com


On Tue, Apr 24, 2012 at 10:15 PM, Richard Cyganiak <richard@cyganiak.de>wrote:

> So, Eric challenged me to present an example of a query over a
> direct-mapped PK-less table that I believe cannot be evaluated in standard
> SQL without materializing the entire table outside of the DB.
>
> First let me say that I've puzzled over this non-PK issue for more than a
> day, trying to come up with some scheme based on cursors or ROWNUM or local
> variables to make it work, and failed. Now, making a leap from “I couldn't
> do it in a day” to “It's impossible” is certainly not quite appropriate,
> but after that experience I felt justified to send an implementation
> experience report to the WG, stating my belief that the cost of
> implementing this scheme are not worth the benefits. Hence my proposal to
> let implementers choose whether they want to implement the lean or non-lean
> direct mapping.
>
> So here we go.
>
>          IOU
>   BORROWER | AMOUNT
>   ---------+-------
>   Alice    |     10
>   Bob      |      5
>   Charlie  |     10
>   Charlie  |     10
>
> The equivalent non-lean direct mapping graph (minus rdf:type triples):
>
>   _:1 <IOU#BORROWER> "Alice".
>   _:1 <IOU#AMOUNT> 10.
>   _:2 <IOU#BORROWER> "Bob".
>   _:2 <IOU#AMOUNT> 5.
>   _:3 <IOU#BORROWER> "Charlie".
>   _:3 <IOU#AMOUNT> 10.
>   _:4 <IOU#BORROWER> "Charlie".
>   _:4 <IOU#AMOUNT> 10.
>
> Now here's a simple SPARQL query:
>
>   SELECT * {
>      {
>         ?x <IOU#BORROWER> "Charlie".
>         ?x ?property ?value.
>      } UNION {
>         ?x <IOU#AMOUNT> 10.
>      }
>   }
>
> The solution should be:
>
>   ?x  | ?property      | ?value
>   ----+----------------+----------
>   _:3 | <IOU#BORROWER> | "Charlie"
>   _:4 | <IOU#BORROWER> | "Charlie"
>   _:3 | <IOU#AMOUNT>   | 10
>   _:4 | <IOU#AMOUNT>   | 10
>   _:1 |                |
>   _:3 |                |
>   _:4 |                |
>
> Can you outline an algorithm that produces this result without
> materializing the table? (Ordering, the difference between
> literals/IRIs/bNodes, and the specific labels for the bNodes don't matter.)
>
> Bonus points if the algorithm is expressed as an R2RML mapping. We can
> assume that we already have an algorithm for evaluating any SPARQL query
> over an R2RML mapping.
>
> Here's my non-standard solution using ROWID, which only works on Oracle:
>
>  SELECT ROWID x, '<IOU#BORROWER>' property, BORROWER value
>         FROM IOU
>         WHERE BORROWER='Charlie'
>  UNION
>  SELECT ROWID x, '<IOU#AMOUNT>' property, AMOUNT value
>         FROM IOU
>         WHERE BORROWER='Charlie'
>  UNION
>  SELECT ROWID x, NULL, NULL
>         FROM IOU
>         WHERE AMOUNT=10
>
> Earning the R2RML bonus points:
>
>   <#map> a rr:TriplesMap;
>      rr:logicalTable [
>         rr:sqlQuery "SELECT ROWID, BORROWER, AMOUNT FROM IOU";
>      ];
>      rr:subjectMap [
>         rr:column "ROWID";
>         rr:termType rr:BlankNode
>      ];
>      rr:predicateObjectMap [
>         rr:predicate <IOU#BORROWER>;
>         rr:objectMap [ rr:column "BORROWER" ];
>      ];
>      rr:predicateObjectMap [
>         rr:predicate <IOU#AMOUNT>;
>         rr:objectMap [ rr:column "AMOUNT" ];
>      ].
>
> Now, how to do this without the ROWID vendor extension???
>
>
> ----
>
> For the record. With a lean direct mapping, the desired output graph would
> be:
>
>   _:1 <IOU#BORROWER> "Alice".
>   _:1 <IOU#AMOUNT> 10.
>   _:2 <IOU#BORROWER> "Bob".
>   _:2 <IOU#AMOUNT> 5.
>   _:3 <IOU#BORROWER> "Charlie".
>   _:3 <IOU#AMOUNT> 10.
>
> The query result would be:
>
>   ?x  | ?property      | ?value
>   ----+----------------+----------
>   _:3 | <IOU#BORROWER> | "Charlie"
>   _:3 | <IOU#AMOUNT>   | 10
>   _:1 |                |
>   _:3 |                |
>
> The standard-compliant SQL query would be as above, but replace ROWID with
> something like (BORROWER || '@@@separator@@@' || AMOUNT), and add
> DISTINCT to each SELECT.
>
> The R2RML query would be the same as above with the following changes:
>
>      rr:logicalTable [
>         rr:tableName "IOU";
>      ];
>      rr:subjectMap [
>         rr:template "{BORROWER}@@@separator@@@{AMOUNT}";
>         rr:termType rr:BlankNode;
>      ];
>
> So, implementing the lean direct mapping is not hard using just standard
> SQL.
>
> Best,
> Richard
>
Received on Wednesday, 25 April 2012 08:26:08 UTC