Re: Brain teaser for non-PK tables

On 24 Apr 2012, at 21:47, Juan Sequeda wrote:
> Before going into this, how about creating a skolem constant URI. 
> 
> This is better than creating a blank node.

This would resolve the leanness issue (the graph with two skolem IRIs generated for the same record is lean).

But you'd still have to work out how to generate the same skolem IRI for the same record each time in the presence of unions and joins. So my puzzle below doesn't become any simpler.

(Possibly it becomes harder, because the use of an IRI might raise stronger user expectations that the returned IRI should be consistent across SPARQL queries and when the DB changes. This is the problem I mentioned with the Jena API, where this expectation also exists for blank node identifiers, unlike in SPARQL where blank node identifiers don't have to be consistent across multiple queries but just within one query.)

Best,
Richard




> 
> Juan Sequeda
> +1-575-SEQ-UEDA
> www.juansequeda.com
> 
> 
> On Tue, Apr 24, 2012 at 10:15 PM, Richard Cyganiak <richard@cyganiak.de> wrote:
> So, Eric challenged me to present an example of a query over a direct-mapped PK-less table that I believe cannot be evaluated in standard SQL without materializing the entire table outside of the DB.
> 
> First let me say that I've puzzled over this non-PK issue for more than a day, trying to come up with some scheme based on cursors or ROWNUM or local variables to make it work, and failed. Now, making a leap from “I couldn't do it in a day” to “It's impossible” is certainly not quite appropriate, but after that experience I felt justified to send an implementation experience report to the WG, stating my belief that the cost of implementing this scheme are not worth the benefits. Hence my proposal to let implementers choose whether they want to implement the lean or non-lean direct mapping.
> 
> So here we go.
> 
>          IOU
>   BORROWER | AMOUNT
>   ---------+-------
>   Alice    |     10
>   Bob      |      5
>   Charlie  |     10
>   Charlie  |     10
> 
> The equivalent non-lean direct mapping graph (minus rdf:type triples):
> 
>   _:1 <IOU#BORROWER> "Alice".
>   _:1 <IOU#AMOUNT> 10.
>   _:2 <IOU#BORROWER> "Bob".
>   _:2 <IOU#AMOUNT> 5.
>   _:3 <IOU#BORROWER> "Charlie".
>   _:3 <IOU#AMOUNT> 10.
>   _:4 <IOU#BORROWER> "Charlie".
>   _:4 <IOU#AMOUNT> 10.
> 
> Now here's a simple SPARQL query:
> 
>   SELECT * {
>      {
>         ?x <IOU#BORROWER> "Charlie".
>         ?x ?property ?value.
>      } UNION {
>         ?x <IOU#AMOUNT> 10.
>      }
>   }
> 
> The solution should be:
> 
>   ?x  | ?property      | ?value
>   ----+----------------+----------
>   _:3 | <IOU#BORROWER> | "Charlie"
>   _:4 | <IOU#BORROWER> | "Charlie"
>   _:3 | <IOU#AMOUNT>   | 10
>   _:4 | <IOU#AMOUNT>   | 10
>   _:1 |                |
>   _:3 |                |
>   _:4 |                |
> 
> Can you outline an algorithm that produces this result without materializing the table? (Ordering, the difference between literals/IRIs/bNodes, and the specific labels for the bNodes don't matter.)
> 
> Bonus points if the algorithm is expressed as an R2RML mapping. We can assume that we already have an algorithm for evaluating any SPARQL query over an R2RML mapping.
> 
> Here's my non-standard solution using ROWID, which only works on Oracle:
> 
>  SELECT ROWID x, '<IOU#BORROWER>' property, BORROWER value
>         FROM IOU
>         WHERE BORROWER='Charlie'
>  UNION
>  SELECT ROWID x, '<IOU#AMOUNT>' property, AMOUNT value
>         FROM IOU
>         WHERE BORROWER='Charlie'
>  UNION
>  SELECT ROWID x, NULL, NULL
>         FROM IOU
>         WHERE AMOUNT=10
> 
> Earning the R2RML bonus points:
> 
>   <#map> a rr:TriplesMap;
>      rr:logicalTable [
>         rr:sqlQuery "SELECT ROWID, BORROWER, AMOUNT FROM IOU";
>      ];
>      rr:subjectMap [
>         rr:column "ROWID";
>         rr:termType rr:BlankNode
>      ];
>      rr:predicateObjectMap [
>         rr:predicate <IOU#BORROWER>;
>         rr:objectMap [ rr:column "BORROWER" ];
>      ];
>      rr:predicateObjectMap [
>         rr:predicate <IOU#AMOUNT>;
>         rr:objectMap [ rr:column "AMOUNT" ];
>      ].
> 
> Now, how to do this without the ROWID vendor extension???
> 
> 
> ----
> 
> For the record. With a lean direct mapping, the desired output graph would be:
> 
>   _:1 <IOU#BORROWER> "Alice".
>   _:1 <IOU#AMOUNT> 10.
>   _:2 <IOU#BORROWER> "Bob".
>   _:2 <IOU#AMOUNT> 5.
>   _:3 <IOU#BORROWER> "Charlie".
>   _:3 <IOU#AMOUNT> 10.
> 
> The query result would be:
> 
>   ?x  | ?property      | ?value
>   ----+----------------+----------
>   _:3 | <IOU#BORROWER> | "Charlie"
>   _:3 | <IOU#AMOUNT>   | 10
>   _:1 |                |
>   _:3 |                |
> 
> The standard-compliant SQL query would be as above, but replace ROWID with something like (BORROWER || '@@@separator@@@' || AMOUNT), and add DISTINCT to each SELECT.
> 
> The R2RML query would be the same as above with the following changes:
> 
>      rr:logicalTable [
>         rr:tableName "IOU";
>      ];
>      rr:subjectMap [
>         rr:template "{BORROWER}@@@separator@@@{AMOUNT}";
>         rr:termType rr:BlankNode;
>      ];
> 
> So, implementing the lean direct mapping is not hard using just standard SQL.
> 
> Best,
> Richard
> 

Received on Tuesday, 24 April 2012 21:06:14 UTC