Re: Brain teaser for non-PK tables from Juan Sequeda on 2012-04-25 (public-rdb2rdf-wg@w3.org from April 2012)

From: Juan Sequeda <juanfederico@gmail.com>
Date: Wed, 25 Apr 2012 15:05:42 +0200
To: Richard Cyganiak <richard@cyganiak.de>
Cc: W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
Message-ID: <CAMVTWDxSgjvWf+2HGwcu0a5g2UmjN7atfysRtdwH3_JgpEw2dg@mail.gmail.com>
You got my vote and Marcelo's. So

+2

My question now is... do we have to go back to last call?

In addition to adding this, we would need to do a minor change in the
appendix to reflect this change. For the Direct Mapping as Rules section,
we would just need to change a bit the definition of generateRowBlankNode
predicate.

For the Denotational semantics, in line 37

[[
else
  a BlankNode unique to r
]]

would need to be changed to reflect the change. Not sure exactly how it
would be done. Eric?




Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com


On Wed, Apr 25, 2012 at 2:52 PM, Richard Cyganiak <richard@cyganiak.de>wrote:

> Hi Juan,
>
> This direction works for me. I would reword it slightly. How about
> replacing the current spec text:
>
> [[
> If the table has no primary key, the row node is a fresh blank node that
> is unique to this row
> ]]
>
> with this wording:
>
> [[
> If the table has no primary key, the row node is a blank node. Distinct
> blank nodes must be generated for rows with distinct column values. For
> duplicate rows with identical values, it is left to the implementation
> whether to generate distinct blank nodes for each duplicate row.
> ]]
>
> and adding an informative NOTE:
>
> [[
> NOTE: In the case of duplicate rows in tables without primary key, if one
> blank node is generated for each row, then the result is a *non-lean* RDF
> graph [RDF Semantic]. If one blank node is generated for each distinct set
> of column values, then the result is a *lean* RDF graph. The lean version
> is equivalent to the non-lean version under RDF Semantics, but does not
> maintain the relational table's cardinalities, and hence gives different
> answers under certain SPARQL queries. The lean version is easily
> expressible in R2RML [R2RML].
> ]]
>
> I think this is the same in spirit as your version, but says less about
> implementation concerns, and motivates the two versions more in terms of
> compatibility with other specs (SPARQL and R2RML).
>
> Best,
> Richard
>
>
> On 25 Apr 2012, at 09:25, Juan Sequeda wrote:
> > What caught my attention was: "let implementers choose whether they want
> to implement the lean or non-lean direct mapping." I like how you phrased
> that. This would imply that there could be two DM: a lean and non-lean.
> >
> > I would propose to change
> >
> > "If the table has no primary key, the row node is a fresh blank node
> that is unique to this row"
> >
> > to
> >
> > "If the table has no primary key, the row node is a blank node. "
> >
>
> > And then have a note/warning.
> >
>
> > [[
> > If you generate a fresh blank node that is unique to this row, then the
> result is a non-lean RDF graph.
> >
> > If you generate the same blank node for repeated tuples, then the result
> is a lean RDF graph.
> >
> > The non-lean DM preserves the cardinality of the tuples, but it
> hard/inefficient to implement in a SPARQL to SQL translator.
> >
> > The lean DM does not preserve the cardinality of the tuples, but the
> implementation is easier/efficient in a SPARQL to SQL translator.
> >
> > If you are implementing a dumping tool, the recommendation is to create
> a non-lean DM in order to maintain the cardinality.
> > ]]
>
> >
> >
> > Juan Sequeda
> > +1-575-SEQ-UEDA
> > www.juansequeda.com
> >
> >
> > On Tue, Apr 24, 2012 at 10:15 PM, Richard Cyganiak <richard@cyganiak.de>
> wrote:
> > So, Eric challenged me to present an example of a query over a
> direct-mapped PK-less table that I believe cannot be evaluated in standard
> SQL without materializing the entire table outside of the DB.
> >
> > First let me say that I've puzzled over this non-PK issue for more than
> a day, trying to come up with some scheme based on cursors or ROWNUM or
> local variables to make it work, and failed. Now, making a leap from “I
> couldn't do it in a day” to “It's impossible” is certainly not quite
> appropriate, but after that experience I felt justified to send an
> implementation experience report to the WG, stating my belief that the cost
> of implementing this scheme are not worth the benefits. Hence my proposal
> to let implementers choose whether they want to implement the lean or
> non-lean direct mapping.
> >
> > So here we go.
> >
> >          IOU
> >   BORROWER | AMOUNT
> >   ---------+-------
> >   Alice    |     10
> >   Bob      |      5
> >   Charlie  |     10
> >   Charlie  |     10
> >
> > The equivalent non-lean direct mapping graph (minus rdf:type triples):
> >
> >   _:1 <IOU#BORROWER> "Alice".
> >   _:1 <IOU#AMOUNT> 10.
> >   _:2 <IOU#BORROWER> "Bob".
> >   _:2 <IOU#AMOUNT> 5.
> >   _:3 <IOU#BORROWER> "Charlie".
> >   _:3 <IOU#AMOUNT> 10.
> >   _:4 <IOU#BORROWER> "Charlie".
> >   _:4 <IOU#AMOUNT> 10.
> >
> > Now here's a simple SPARQL query:
> >
> >   SELECT * {
> >      {
> >         ?x <IOU#BORROWER> "Charlie".
> >         ?x ?property ?value.
> >      } UNION {
> >         ?x <IOU#AMOUNT> 10.
> >      }
> >   }
> >
> > The solution should be:
> >
> >   ?x  | ?property      | ?value
> >   ----+----------------+----------
> >   _:3 | <IOU#BORROWER> | "Charlie"
> >   _:4 | <IOU#BORROWER> | "Charlie"
> >   _:3 | <IOU#AMOUNT>   | 10
> >   _:4 | <IOU#AMOUNT>   | 10
> >   _:1 |                |
> >   _:3 |                |
> >   _:4 |                |
> >
> > Can you outline an algorithm that produces this result without
> materializing the table? (Ordering, the difference between
> literals/IRIs/bNodes, and the specific labels for the bNodes don't matter.)
> >
> > Bonus points if the algorithm is expressed as an R2RML mapping. We can
> assume that we already have an algorithm for evaluating any SPARQL query
> over an R2RML mapping.
> >
> > Here's my non-standard solution using ROWID, which only works on Oracle:
> >
> >  SELECT ROWID x, '<IOU#BORROWER>' property, BORROWER value
> >         FROM IOU
> >         WHERE BORROWER='Charlie'
> >  UNION
> >  SELECT ROWID x, '<IOU#AMOUNT>' property, AMOUNT value
> >         FROM IOU
> >         WHERE BORROWER='Charlie'
> >  UNION
> >  SELECT ROWID x, NULL, NULL
> >         FROM IOU
> >         WHERE AMOUNT=10
> >
> > Earning the R2RML bonus points:
> >
> >   <#map> a rr:TriplesMap;
> >      rr:logicalTable [
> >         rr:sqlQuery "SELECT ROWID, BORROWER, AMOUNT FROM IOU";
> >      ];
> >      rr:subjectMap [
> >         rr:column "ROWID";
> >         rr:termType rr:BlankNode
> >      ];
> >      rr:predicateObjectMap [
> >         rr:predicate <IOU#BORROWER>;
> >         rr:objectMap [ rr:column "BORROWER" ];
> >      ];
> >      rr:predicateObjectMap [
> >         rr:predicate <IOU#AMOUNT>;
> >         rr:objectMap [ rr:column "AMOUNT" ];
> >      ].
> >
> > Now, how to do this without the ROWID vendor extension???
> >
> >
> > ----
> >
> > For the record. With a lean direct mapping, the desired output graph
> would be:
> >
> >   _:1 <IOU#BORROWER> "Alice".
> >   _:1 <IOU#AMOUNT> 10.
> >   _:2 <IOU#BORROWER> "Bob".
> >   _:2 <IOU#AMOUNT> 5.
> >   _:3 <IOU#BORROWER> "Charlie".
> >   _:3 <IOU#AMOUNT> 10.
> >
> > The query result would be:
> >
> >   ?x  | ?property      | ?value
> >   ----+----------------+----------
> >   _:3 | <IOU#BORROWER> | "Charlie"
> >   _:3 | <IOU#AMOUNT>   | 10
> >   _:1 |                |
> >   _:3 |                |
> >
> > The standard-compliant SQL query would be as above, but replace ROWID
> with something like (BORROWER || '@@@separator@@@' || AMOUNT), and add
> DISTINCT to each SELECT.
> >
> > The R2RML query would be the same as above with the following changes:
> >
> >      rr:logicalTable [
> >         rr:tableName "IOU";
> >      ];
> >      rr:subjectMap [
> >         rr:template "{BORROWER}@@@separator@@@{AMOUNT}";
> >         rr:termType rr:BlankNode;
> >      ];
> >
> > So, implementing the lean direct mapping is not hard using just standard
> SQL.
> >
> > Best,
> > Richard
> >
>
>
Received on Wednesday, 25 April 2012 13:06:34 UTC