- From: Juan Sequeda <juanfederico@gmail.com>
- Date: Wed, 25 Apr 2012 15:05:42 +0200
- To: Richard Cyganiak <richard@cyganiak.de>
- Cc: W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
- Message-ID: <CAMVTWDxSgjvWf+2HGwcu0a5g2UmjN7atfysRtdwH3_JgpEw2dg@mail.gmail.com>
You got my vote and Marcelo's. So +2 My question now is... do we have to go back to last call? In addition to adding this, we would need to do a minor change in the appendix to reflect this change. For the Direct Mapping as Rules section, we would just need to change a bit the definition of generateRowBlankNode predicate. For the Denotational semantics, in line 37 [[ else a BlankNode unique to r ]] would need to be changed to reflect the change. Not sure exactly how it would be done. Eric? Juan Sequeda +1-575-SEQ-UEDA www.juansequeda.com On Wed, Apr 25, 2012 at 2:52 PM, Richard Cyganiak <richard@cyganiak.de>wrote: > Hi Juan, > > This direction works for me. I would reword it slightly. How about > replacing the current spec text: > > [[ > If the table has no primary key, the row node is a fresh blank node that > is unique to this row > ]] > > with this wording: > > [[ > If the table has no primary key, the row node is a blank node. Distinct > blank nodes must be generated for rows with distinct column values. For > duplicate rows with identical values, it is left to the implementation > whether to generate distinct blank nodes for each duplicate row. > ]] > > and adding an informative NOTE: > > [[ > NOTE: In the case of duplicate rows in tables without primary key, if one > blank node is generated for each row, then the result is a *non-lean* RDF > graph [RDF Semantic]. If one blank node is generated for each distinct set > of column values, then the result is a *lean* RDF graph. The lean version > is equivalent to the non-lean version under RDF Semantics, but does not > maintain the relational table's cardinalities, and hence gives different > answers under certain SPARQL queries. The lean version is easily > expressible in R2RML [R2RML]. > ]] > > I think this is the same in spirit as your version, but says less about > implementation concerns, and motivates the two versions more in terms of > compatibility with other specs (SPARQL and R2RML). > > Best, > Richard > > > On 25 Apr 2012, at 09:25, Juan Sequeda wrote: > > What caught my attention was: "let implementers choose whether they want > to implement the lean or non-lean direct mapping." I like how you phrased > that. This would imply that there could be two DM: a lean and non-lean. > > > > I would propose to change > > > > "If the table has no primary key, the row node is a fresh blank node > that is unique to this row" > > > > to > > > > "If the table has no primary key, the row node is a blank node. " > > > > > And then have a note/warning. > > > > > [[ > > If you generate a fresh blank node that is unique to this row, then the > result is a non-lean RDF graph. > > > > If you generate the same blank node for repeated tuples, then the result > is a lean RDF graph. > > > > The non-lean DM preserves the cardinality of the tuples, but it > hard/inefficient to implement in a SPARQL to SQL translator. > > > > The lean DM does not preserve the cardinality of the tuples, but the > implementation is easier/efficient in a SPARQL to SQL translator. > > > > If you are implementing a dumping tool, the recommendation is to create > a non-lean DM in order to maintain the cardinality. > > ]] > > > > > > > Juan Sequeda > > +1-575-SEQ-UEDA > > www.juansequeda.com > > > > > > On Tue, Apr 24, 2012 at 10:15 PM, Richard Cyganiak <richard@cyganiak.de> > wrote: > > So, Eric challenged me to present an example of a query over a > direct-mapped PK-less table that I believe cannot be evaluated in standard > SQL without materializing the entire table outside of the DB. > > > > First let me say that I've puzzled over this non-PK issue for more than > a day, trying to come up with some scheme based on cursors or ROWNUM or > local variables to make it work, and failed. Now, making a leap from “I > couldn't do it in a day” to “It's impossible” is certainly not quite > appropriate, but after that experience I felt justified to send an > implementation experience report to the WG, stating my belief that the cost > of implementing this scheme are not worth the benefits. Hence my proposal > to let implementers choose whether they want to implement the lean or > non-lean direct mapping. > > > > So here we go. > > > > IOU > > BORROWER | AMOUNT > > ---------+------- > > Alice | 10 > > Bob | 5 > > Charlie | 10 > > Charlie | 10 > > > > The equivalent non-lean direct mapping graph (minus rdf:type triples): > > > > _:1 <IOU#BORROWER> "Alice". > > _:1 <IOU#AMOUNT> 10. > > _:2 <IOU#BORROWER> "Bob". > > _:2 <IOU#AMOUNT> 5. > > _:3 <IOU#BORROWER> "Charlie". > > _:3 <IOU#AMOUNT> 10. > > _:4 <IOU#BORROWER> "Charlie". > > _:4 <IOU#AMOUNT> 10. > > > > Now here's a simple SPARQL query: > > > > SELECT * { > > { > > ?x <IOU#BORROWER> "Charlie". > > ?x ?property ?value. > > } UNION { > > ?x <IOU#AMOUNT> 10. > > } > > } > > > > The solution should be: > > > > ?x | ?property | ?value > > ----+----------------+---------- > > _:3 | <IOU#BORROWER> | "Charlie" > > _:4 | <IOU#BORROWER> | "Charlie" > > _:3 | <IOU#AMOUNT> | 10 > > _:4 | <IOU#AMOUNT> | 10 > > _:1 | | > > _:3 | | > > _:4 | | > > > > Can you outline an algorithm that produces this result without > materializing the table? (Ordering, the difference between > literals/IRIs/bNodes, and the specific labels for the bNodes don't matter.) > > > > Bonus points if the algorithm is expressed as an R2RML mapping. We can > assume that we already have an algorithm for evaluating any SPARQL query > over an R2RML mapping. > > > > Here's my non-standard solution using ROWID, which only works on Oracle: > > > > SELECT ROWID x, '<IOU#BORROWER>' property, BORROWER value > > FROM IOU > > WHERE BORROWER='Charlie' > > UNION > > SELECT ROWID x, '<IOU#AMOUNT>' property, AMOUNT value > > FROM IOU > > WHERE BORROWER='Charlie' > > UNION > > SELECT ROWID x, NULL, NULL > > FROM IOU > > WHERE AMOUNT=10 > > > > Earning the R2RML bonus points: > > > > <#map> a rr:TriplesMap; > > rr:logicalTable [ > > rr:sqlQuery "SELECT ROWID, BORROWER, AMOUNT FROM IOU"; > > ]; > > rr:subjectMap [ > > rr:column "ROWID"; > > rr:termType rr:BlankNode > > ]; > > rr:predicateObjectMap [ > > rr:predicate <IOU#BORROWER>; > > rr:objectMap [ rr:column "BORROWER" ]; > > ]; > > rr:predicateObjectMap [ > > rr:predicate <IOU#AMOUNT>; > > rr:objectMap [ rr:column "AMOUNT" ]; > > ]. > > > > Now, how to do this without the ROWID vendor extension??? > > > > > > ---- > > > > For the record. With a lean direct mapping, the desired output graph > would be: > > > > _:1 <IOU#BORROWER> "Alice". > > _:1 <IOU#AMOUNT> 10. > > _:2 <IOU#BORROWER> "Bob". > > _:2 <IOU#AMOUNT> 5. > > _:3 <IOU#BORROWER> "Charlie". > > _:3 <IOU#AMOUNT> 10. > > > > The query result would be: > > > > ?x | ?property | ?value > > ----+----------------+---------- > > _:3 | <IOU#BORROWER> | "Charlie" > > _:3 | <IOU#AMOUNT> | 10 > > _:1 | | > > _:3 | | > > > > The standard-compliant SQL query would be as above, but replace ROWID > with something like (BORROWER || '@@@separator@@@' || AMOUNT), and add > DISTINCT to each SELECT. > > > > The R2RML query would be the same as above with the following changes: > > > > rr:logicalTable [ > > rr:tableName "IOU"; > > ]; > > rr:subjectMap [ > > rr:template "{BORROWER}@@@separator@@@{AMOUNT}"; > > rr:termType rr:BlankNode; > > ]; > > > > So, implementing the lean direct mapping is not hard using just standard > SQL. > > > > Best, > > Richard > > > >
Received on Wednesday, 25 April 2012 13:06:34 UTC