- From: Juan Sequeda <juanfederico@gmail.com>
- Date: Thu, 26 Apr 2012 14:05:40 +0200
- To: Richard Cyganiak <richard@cyganiak.de>
- Cc: Ivan Herman <ivan@w3.org>, "ashok.malhotra@oracle.com" <ashok.malhotra@oracle.com>, "public-rdb2rdf-wg@w3.org" <public-rdb2rdf-wg@w3.org>
- Message-ID: <CAMVTWDwJvgaqR7jAf5EzZMuTLLY05yeAy8PyUT6Vpuj1XQdpHA@mail.gmail.com>
exactly! Juan Sequeda +1-575-SEQ-UEDA www.juansequeda.com On Thu, Apr 26, 2012 at 1:10 PM, Richard Cyganiak <richard@cyganiak.de>wrote: > Juan, > > On 26 Apr 2012, at 11:14, Juan Sequeda wrote: > > no need to state the effect: lean vs non lean rdf graph > > Sorry, but this is very terse and I don't understand what you are saying. > > Are you agreeing to the PROPOSAL? Or are you saying something like this > should be added: > > “… implementations SHOULD generate a fresh blank for each duplicate row > (resulting in a non-lean RDF graph [RDF Semantics]).” > “… implementations MAY re-use the same blank node for multiple duplicate > rows (resulting in a lean RDF graph).” > > Best, > Richard > > > > > > On Thu, Apr 26, 2012 at 12:02 PM, Richard Cyganiak <richard@cyganiak.de> > wrote: > > Ivan, > > > > See inline for responses and a PROPOSAL. > > > > On 26 Apr 2012, at 08:43, Ivan Herman wrote: > > >>> [[[ > > >>> In general, for duplicate rows with identical values, > implementations should use fresh blank nodes for each duplicate row. > However, if the underlying database system does not provide any means to > reliably differentiate among the rows via, eg, row ids, it is acceptable to > implentations to reuse blank nodes. > > >>> ]]] > > >> > > >> I'm ok with that. I would rather remove the mention of ROWIDs, to > make the hidden translation a bit less obvious (“Oracle should implement it > with fresh blank nodes; for everyone else, it is acceptable to re-use the > same blank node for duplicate rows.”) > > > > > > I am fine if you find a suitable technical term there; or simply drop > the "eg, row ids," > > > > Let's drop it then. > > > > >>> I wonder wheter we should not add that in such a case a warning > should also be issued. > > >> > > >> An implementation would either have to always show the warning, or > never. That's not helpful to anyone. It's also unclear how warnings would > be delivered and to whom. > > > > > > I am not sure whether warning system is referred to anywhere else in > the doc. But something with MAY is neutral enough. That being said, this is > a side issue. > > > > Or we could just say that systems SHOULD document/advertise their choice > of implementation strategy. Sending warnings at runtime would be one way of > doing that I suppose ;-) > > > > >> We could specify two different conformance levels or conformance > modes (lean/non-lean), and make conforming implementations declare > explicitly which one they support. > > > > > > The original question was whether this would lead to new LC or not. I > think that if we use the formulation above, it is fine to go ahead to PR. > Introducing new conformance modes definitely sends back the document to LC. > I am not sure it is worth it, to be honest. > > > > I agree, not worth it. To put it all together (with minor rewording): > > > > PROPOSAL: In the DM spec, replace the following text: > > > > [[ > > If the table has no primary key, the row node is a fresh blank node that > is unique to this row. > > ]] > > > > with this: > > > > [[ > > If the table has no primary key, the row node is a blank node. Distinct > blank nodes MUST be generated for rows with distinct column values. For > duplicate rows with identical values, implementations SHOULD generate a > fresh blank for each duplicate row. However, if the underlying database > system does not provide any means to reliably differentiate among the rows, > then implementations MAY re-use the same blank node for multiple duplicate > rows. Implementations SHOULD document and advertise their chosen behavior. > > ]] > > > > Best, > > Richard > > > > > > > > > > > > Ivan > > > > > > > > >> Best, > > >> Richard > > >> > > >> > > >> > > >>> > > >>> The wording on how to describe the corner case probably needs > refining, but you get what I mean, I guess. > > >>> > > >>> If that is the only change, I guess it could be argued that such a > change is reflecting implementation experience, and would not constitute a > change warranting a second LC. > > >>> > > >>> Ivan > > >>> > > >>> --- > > >>> Ivan Herman > > >>> Tel:+31 641044153 > > >>> http://www.ivan-herman.net > > >>> > > >>> (Written on mobile, sorry for brevity and misspellings...) > > >>> > > >>> > > >>> > > >>> On 25 Apr 2012, at 17:08, Ivan Herman <ivan@w3.org> wrote: > > >>> > > >>>> The way I read this, and if my understanding is correct, it > clarifies a potential ambiguity in the spec. As Michael put it, this is > what CR is for, and I would not go to another LC for this. > > >>>> > > >>>> Ivan > > >>>> > > >>>> On Apr 25, 2012, at 15:48 , ashok malhotra wrote: > > >>>> > > >>>>> Ivan: > > >>>>> We need your guidance on this > > >>>>> > > >>>>> Re. Whether this needs another Last Call, the proposal is to > replace > > >>>>> [[ > > >>>>> If the table has no primary key, the row node is a fresh blank > node that is unique to this row > > >>>>> ]] > > >>>>> with this wording: > > >>>>> [[ > > >>>>> If the table has no primary key, the row node is a blank node. > Distinct blank nodes must be generated for rows with distinct column > values. For duplicate rows with identical values, it is left to the > implementation whether to generate distinct blank nodes for each duplicate > row. > > >>>>> ]] > > >>>>> > > >>>>> As I see it, this offers the implementation additional freedom in > a corner case. > > >>>>> Not sure if that constitutes a material change in the semantics. > > >>>>> All the best, Ashok > > >>>>> > > >>>>> On 4/25/2012 6:05 AM, Juan Sequeda wrote: > > >>>>>> You got my vote and Marcelo's. So > > >>>>>> > > >>>>>> +2 > > >>>>>> > > >>>>>> My question now is... do we have to go back to last call? > > >>>>>> > > >>>>>> In addition to adding this, we would need to do a minor change in > the appendix to reflect this change. For the Direct Mapping as Rules > section, we would just need to change a bit the definition of > generateRowBlankNode predicate. > > >>>>>> > > >>>>>> For the Denotational semantics, in line 37 > > >>>>>> > > >>>>>> [[ > > >>>>>> else > > >>>>>> a BlankNode unique to r > > >>>>>> ]] > > >>>>>> > > >>>>>> would need to be changed to reflect the change. Not sure exactly > how it would be done. Eric? > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> Juan Sequeda > > >>>>>> +1-575-SEQ-UEDA > > >>>>>> www.juansequeda.com > > >>>>>> > > >>>>>> > > >>>>>> On Wed, Apr 25, 2012 at 2:52 PM, Richard Cyganiak < > richard@cyganiak.de> wrote: > > >>>>>> Hi Juan, > > >>>>>> > > >>>>>> This direction works for me. I would reword it slightly. How > about replacing the current spec text: > > >>>>>> > > >>>>>> [[ > > >>>>>> If the table has no primary key, the row node is a fresh blank > node that is unique to this row > > >>>>>> ]] > > >>>>>> > > >>>>>> with this wording: > > >>>>>> > > >>>>>> [[ > > >>>>>> If the table has no primary key, the row node is a blank node. > Distinct blank nodes must be generated for rows with distinct column > values. For duplicate rows with identical values, it is left to the > implementation whether to generate distinct blank nodes for each duplicate > row. > > >>>>>> ]] > > >>>>>> > > >>>>>> and adding an informative NOTE: > > >>>>>> > > >>>>>> [[ > > >>>>>> NOTE: In the case of duplicate rows in tables without primary > key, if one blank node is generated for each row, then the result is a > *non-lean* RDF graph [RDF Semantic]. If one blank node is generated for > each distinct set of column values, then the result is a *lean* RDF graph. > The lean version is equivalent to the non-lean version under RDF Semantics, > but does not maintain the relational table's cardinalities, and hence gives > different answers under certain SPARQL queries. The lean version is easily > expressible in R2RML [R2RML]. > > >>>>>> ]] > > >>>>>> > > >>>>>> I think this is the same in spirit as your version, but says less > about implementation concerns, and motivates the two versions more in terms > of compatibility with other specs (SPARQL and R2RML). > > >>>>>> > > >>>>>> Best, > > >>>>>> Richard > > >>>>>> > > >>>>>> > > >>>>>> On 25 Apr 2012, at 09:25, Juan Sequeda wrote: > > >>>>>>> What caught my attention was: "let implementers choose whether > they want to implement the lean or non-lean direct mapping." I like how you > phrased that. This would imply that there could be two DM: a lean and > non-lean. > > >>>>>>> > > >>>>>>> I would propose to change > > >>>>>>> > > >>>>>>> "If the table has no primary key, the row node is a fresh blank > node that is unique to this row" > > >>>>>>> > > >>>>>>> to > > >>>>>>> > > >>>>>>> "If the table has no primary key, the row node is a blank node. " > > >>>>>>> > > >>>>>> > > >>>>>>> And then have a note/warning. > > >>>>>>> > > >>>>>> > > >>>>>>> [[ > > >>>>>>> If you generate a fresh blank node that is unique to this row, > then the result is a non-lean RDF graph. > > >>>>>>> > > >>>>>>> If you generate the same blank node for repeated tuples, then > the result is a lean RDF graph. > > >>>>>>> > > >>>>>>> The non-lean DM preserves the cardinality of the tuples, but it > hard/inefficient to implement in a SPARQL to SQL translator. > > >>>>>>> > > >>>>>>> The lean DM does not preserve the cardinality of the tuples, but > the implementation is easier/efficient in a SPARQL to SQL translator. > > >>>>>>> > > >>>>>>> If you are implementing a dumping tool, the recommendation is to > create a non-lean DM in order to maintain the cardinality. > > >>>>>>> ]] > > >>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> Juan Sequeda > > >>>>>>> +1-575-SEQ-UEDA > > >>>>>>> www.juansequeda.com > > >>>>>>> > > >>>>>>> > > >>>>>>> On Tue, Apr 24, 2012 at 10:15 PM, Richard Cyganiak < > richard@cyganiak.de> wrote: > > >>>>>>> So, Eric challenged me to present an example of a query over a > direct-mapped PK-less table that I believe cannot be evaluated in standard > SQL without materializing the entire table outside of the DB. > > >>>>>>> > > >>>>>>> First let me say that I've puzzled over this non-PK issue for > more than a day, trying to come up with some scheme based on cursors or > ROWNUM or local variables to make it work, and failed. Now, making a leap > from “I couldn't do it in a day” to “It's impossible” is certainly not > quite appropriate, but after that experience I felt justified to send an > implementation experience report to the WG, stating my belief that the cost > of implementing this scheme are not worth the benefits. Hence my proposal > to let implementers choose whether they want to implement the lean or > non-lean direct mapping. > > >>>>>>> > > >>>>>>> So here we go. > > >>>>>>> > > >>>>>>> IOU > > >>>>>>> BORROWER | AMOUNT > > >>>>>>> ---------+------- > > >>>>>>> Alice | 10 > > >>>>>>> Bob | 5 > > >>>>>>> Charlie | 10 > > >>>>>>> Charlie | 10 > > >>>>>>> > > >>>>>>> The equivalent non-lean direct mapping graph (minus rdf:type > triples): > > >>>>>>> > > >>>>>>> _:1 <IOU#BORROWER> "Alice". > > >>>>>>> _:1 <IOU#AMOUNT> 10. > > >>>>>>> _:2 <IOU#BORROWER> "Bob". > > >>>>>>> _:2 <IOU#AMOUNT> 5. > > >>>>>>> _:3 <IOU#BORROWER> "Charlie". > > >>>>>>> _:3 <IOU#AMOUNT> 10. > > >>>>>>> _:4 <IOU#BORROWER> "Charlie". > > >>>>>>> _:4 <IOU#AMOUNT> 10. > > >>>>>>> > > >>>>>>> Now here's a simple SPARQL query: > > >>>>>>> > > >>>>>>> SELECT * { > > >>>>>>> { > > >>>>>>> ?x <IOU#BORROWER> "Charlie". > > >>>>>>> ?x ?property ?value. > > >>>>>>> } UNION { > > >>>>>>> ?x <IOU#AMOUNT> 10. > > >>>>>>> } > > >>>>>>> } > > >>>>>>> > > >>>>>>> The solution should be: > > >>>>>>> > > >>>>>>> ?x | ?property | ?value > > >>>>>>> ----+----------------+---------- > > >>>>>>> _:3 | <IOU#BORROWER> | "Charlie" > > >>>>>>> _:4 | <IOU#BORROWER> | "Charlie" > > >>>>>>> _:3 | <IOU#AMOUNT> | 10 > > >>>>>>> _:4 | <IOU#AMOUNT> | 10 > > >>>>>>> _:1 | | > > >>>>>>> _:3 | | > > >>>>>>> _:4 | | > > >>>>>>> > > >>>>>>> Can you outline an algorithm that produces this result without > materializing the table? (Ordering, the difference between > literals/IRIs/bNodes, and the specific labels for the bNodes don't matter.) > > >>>>>>> > > >>>>>>> Bonus points if the algorithm is expressed as an R2RML mapping. > We can assume that we already have an algorithm for evaluating any SPARQL > query over an R2RML mapping. > > >>>>>>> > > >>>>>>> Here's my non-standard solution using ROWID, which only works on > Oracle: > > >>>>>>> > > >>>>>>> SELECT ROWID x, '<IOU#BORROWER>' property, BORROWER value > > >>>>>>> FROM IOU > > >>>>>>> WHERE BORROWER='Charlie' > > >>>>>>> UNION > > >>>>>>> SELECT ROWID x, '<IOU#AMOUNT>' property, AMOUNT value > > >>>>>>> FROM IOU > > >>>>>>> WHERE BORROWER='Charlie' > > >>>>>>> UNION > > >>>>>>> SELECT ROWID x, NULL, NULL > > >>>>>>> FROM IOU > > >>>>>>> WHERE AMOUNT=10 > > >>>>>>> > > >>>>>>> Earning the R2RML bonus points: > > >>>>>>> > > >>>>>>> <#map> a rr:TriplesMap; > > >>>>>>> rr:logicalTable [ > > >>>>>>> rr:sqlQuery "SELECT ROWID, BORROWER, AMOUNT FROM IOU"; > > >>>>>>> ]; > > >>>>>>> rr:subjectMap [ > > >>>>>>> rr:column "ROWID"; > > >>>>>>> rr:termType rr:BlankNode > > >>>>>>> ]; > > >>>>>>> rr:predicateObjectMap [ > > >>>>>>> rr:predicate <IOU#BORROWER>; > > >>>>>>> rr:objectMap [ rr:column "BORROWER" ]; > > >>>>>>> ]; > > >>>>>>> rr:predicateObjectMap [ > > >>>>>>> rr:predicate <IOU#AMOUNT>; > > >>>>>>> rr:objectMap [ rr:column "AMOUNT" ]; > > >>>>>>> ]. > > >>>>>>> > > >>>>>>> Now, how to do this without the ROWID vendor extension??? > > >>>>>>> > > >>>>>>> > > >>>>>>> ---- > > >>>>>>> > > >>>>>>> For the record. With a lean direct mapping, the desired output > graph would be: > > >>>>>>> > > >>>>>>> _:1 <IOU#BORROWER> "Alice". > > >>>>>>> _:1 <IOU#AMOUNT> 10. > > >>>>>>> _:2 <IOU#BORROWER> "Bob". > > >>>>>>> _:2 <IOU#AMOUNT> 5. > > >>>>>>> _:3 <IOU#BORROWER> "Charlie". > > >>>>>>> _:3 <IOU#AMOUNT> 10. > > >>>>>>> > > >>>>>>> The query result would be: > > >>>>>>> > > >>>>>>> ?x | ?property | ?value > > >>>>>>> ----+----------------+---------- > > >>>>>>> _:3 | <IOU#BORROWER> | "Charlie" > > >>>>>>> _:3 | <IOU#AMOUNT> | 10 > > >>>>>>> _:1 | | > > >>>>>>> _:3 | | > > >>>>>>> > > >>>>>>> The standard-compliant SQL query would be as above, but replace > ROWID with something like (BORROWER || '@@@separator@@@' || AMOUNT), and > add DISTINCT to each SELECT. > > >>>>>>> > > >>>>>>> The R2RML query would be the same as above with the following > changes: > > >>>>>>> > > >>>>>>> rr:logicalTable [ > > >>>>>>> rr:tableName "IOU"; > > >>>>>>> ]; > > >>>>>>> rr:subjectMap [ > > >>>>>>> rr:template "{BORROWER}@@@separator@@@{AMOUNT}"; > > >>>>>>> rr:termType rr:BlankNode; > > >>>>>>> ]; > > >>>>>>> > > >>>>>>> So, implementing the lean direct mapping is not hard using just > standard SQL. > > >>>>>>> > > >>>>>>> Best, > > >>>>>>> Richard > > >>>>>>> > > >>>>>> > > >>>>>> > > >>>> > > >>>> > > >>>> ---- > > >>>> Ivan Herman, W3C Semantic Web Activity Lead > > >>>> Home: http://www.w3.org/People/Ivan/ > > >>>> mobile: +31-641044153 > > >>>> FOAF: http://www.ivan-herman.net/foaf.rdf > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> > > >>> > > >> > > >> > > > > > > > > > ---- > > > Ivan Herman, W3C Semantic Web Activity Lead > > > Home: http://www.w3.org/People/Ivan/ > > > mobile: +31-641044153 > > > FOAF: http://www.ivan-herman.net/foaf.rdf > > > > > > > > > > > > > > > > > > > > > > >
Received on Thursday, 26 April 2012 12:06:34 UTC