Re: Re 2: Brain teaser for non-PK tables from Ivan Herman on 2012-04-26 (public-rdb2rdf-wg@w3.org from April 2012)

From: Ivan Herman <ivan@w3.org>
Date: Thu, 26 Apr 2012 09:43:39 +0200
To: Richard Cyganiak <richard@cyganiak.de>
Cc: "ashok.malhotra@oracle.com" <ashok.malhotra@oracle.com>, "public-rdb2rdf-wg@w3.org" <public-rdb2rdf-wg@w3.org>
Message-Id: <3F0911CD-F2CA-4B79-9444-96D0F7FAFE55@w3.org>
On Apr 25, 2012, at 23:18 , Richard Cyganiak wrote:

> On 25 Apr 2012, at 19:07, Ivan Herman wrote:
>> [[[
>> In general, for duplicate rows with identical values, implementations should use fresh blank nodes for each duplicate row. However,  if the underlying database system does not provide any means to reliably differentiate among the rows via, eg, row ids, it is acceptable to implentations to reuse blank nodes.
>> ]]]
> 
> I'm ok with that. I would rather remove the mention of ROWIDs, to make the hidden translation a bit less obvious (“Oracle should implement it with fresh blank nodes; for everyone else, it is acceptable to re-use the same blank node for duplicate rows.”)

I am fine if you find a suitable technical term there; or simply drop the "eg, row ids,"

> 
>> I wonder wheter we should not add that in such a case a warning should also be issued.
> 
> An implementation would either have to always show the warning, or never. That's not helpful to anyone. It's also unclear how warnings would be delivered and to whom.
> 

I am not sure whether warning system is referred to anywhere else in the doc. But something with MAY is neutral enough. That being said, this is a side issue.

> We could specify two different conformance levels or conformance modes (lean/non-lean), and make conforming implementations declare explicitly which one they support.
> 

The original question was whether this would lead to new LC or not. I think that if we use the formulation above, it is fine to go ahead to PR. Introducing new conformance modes definitely sends back the document to LC. I am not sure it is worth it, to be honest.

Ivan


> Best,
> Richard
> 
> 
> 
>> 
>> The wording on how to describe the corner case probably needs refining, but you get what I mean, I guess.
>> 
>> If that is the only change, I guess it could be argued that such a change is reflecting implementation experience, and would not constitute a change warranting a second LC.
>> 
>> Ivan
>> 
>> ---
>> Ivan Herman
>> Tel:+31 641044153
>> http://www.ivan-herman.net
>> 
>> (Written on mobile, sorry for brevity and misspellings...)
>> 
>> 
>> 
>> On 25 Apr 2012, at 17:08, Ivan Herman <ivan@w3.org> wrote:
>> 
>>> The way I read this, and if my understanding is correct, it clarifies a potential ambiguity in the spec. As Michael put it, this is what CR is for, and I would not go to another LC for this.
>>> 
>>> Ivan
>>> 
>>> On Apr 25, 2012, at 15:48 , ashok malhotra wrote:
>>> 
>>>> Ivan:
>>>> We need your guidance on this
>>>> 
>>>> Re.  Whether this needs another Last Call, the proposal is to replace
>>>> [[
>>>> If the table has no primary key, the row node is a fresh blank node that is unique to this row
>>>> ]]
>>>> with this wording:
>>>> [[
>>>> If the table has no primary key, the row node is a blank node. Distinct blank nodes must be generated for rows with distinct column values. For duplicate rows with identical values, it is left to the implementation whether to generate distinct blank nodes for each duplicate row.
>>>> ]]
>>>> 
>>>> As I see it, this offers the implementation additional freedom in a corner case.
>>>> Not sure if that constitutes a material change in the semantics.
>>>> All the best, Ashok
>>>> 
>>>> On 4/25/2012 6:05 AM, Juan Sequeda wrote:
>>>>> You got my vote and Marcelo's. So
>>>>> 
>>>>> +2
>>>>> 
>>>>> My question now is... do we have to go back to last call?
>>>>> 
>>>>> In addition to adding this, we would need to do a minor change in the appendix to reflect this change. For the Direct Mapping as Rules section, we would just need to change a bit the definition of generateRowBlankNode predicate. 
>>>>> 
>>>>> For the Denotational semantics, in line 37
>>>>> 
>>>>> [[
>>>>> else
>>>>> a BlankNode unique to r
>>>>> ]]
>>>>> 
>>>>> would need to be changed to reflect the change. Not sure exactly how it would be done. Eric?
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Juan Sequeda
>>>>> +1-575-SEQ-UEDA
>>>>> www.juansequeda.com
>>>>> 
>>>>> 
>>>>> On Wed, Apr 25, 2012 at 2:52 PM, Richard Cyganiak <richard@cyganiak.de> wrote:
>>>>> Hi Juan,
>>>>> 
>>>>> This direction works for me. I would reword it slightly. How about replacing the current spec text:
>>>>> 
>>>>> [[
>>>>> If the table has no primary key, the row node is a fresh blank node that is unique to this row
>>>>> ]]
>>>>> 
>>>>> with this wording:
>>>>> 
>>>>> [[
>>>>> If the table has no primary key, the row node is a blank node. Distinct blank nodes must be generated for rows with distinct column values. For duplicate rows with identical values, it is left to the implementation whether to generate distinct blank nodes for each duplicate row.
>>>>> ]]
>>>>> 
>>>>> and adding an informative NOTE:
>>>>> 
>>>>> [[
>>>>> NOTE: In the case of duplicate rows in tables without primary key, if one blank node is generated for each row, then the result is a *non-lean* RDF graph [RDF Semantic]. If one blank node is generated for each distinct set of column values, then the result is a *lean* RDF graph. The lean version is equivalent to the non-lean version under RDF Semantics, but does not maintain the relational table's cardinalities, and hence gives different answers under certain SPARQL queries. The lean version is easily expressible in R2RML [R2RML].
>>>>> ]]
>>>>> 
>>>>> I think this is the same in spirit as your version, but says less about implementation concerns, and motivates the two versions more in terms of compatibility with other specs (SPARQL and R2RML).
>>>>> 
>>>>> Best,
>>>>> Richard
>>>>> 
>>>>> 
>>>>> On 25 Apr 2012, at 09:25, Juan Sequeda wrote:
>>>>>> What caught my attention was: "let implementers choose whether they want to implement the lean or non-lean direct mapping." I like how you phrased that. This would imply that there could be two DM: a lean and non-lean.
>>>>>> 
>>>>>> I would propose to change
>>>>>> 
>>>>>> "If the table has no primary key, the row node is a fresh blank node that is unique to this row"
>>>>>> 
>>>>>> to
>>>>>> 
>>>>>> "If the table has no primary key, the row node is a blank node. "
>>>>>> 
>>>>> 
>>>>>> And then have a note/warning.
>>>>>> 
>>>>> 
>>>>>> [[
>>>>>> If you generate a fresh blank node that is unique to this row, then the result is a non-lean RDF graph.
>>>>>> 
>>>>>> If you generate the same blank node for repeated tuples, then the result is a lean RDF graph.
>>>>>> 
>>>>>> The non-lean DM preserves the cardinality of the tuples, but it hard/inefficient to implement in a SPARQL to SQL translator.
>>>>>> 
>>>>>> The lean DM does not preserve the cardinality of the tuples, but the implementation is easier/efficient in a SPARQL to SQL translator.
>>>>>> 
>>>>>> If you are implementing a dumping tool, the recommendation is to create a non-lean DM in order to maintain the cardinality.
>>>>>> ]]
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Juan Sequeda
>>>>>> +1-575-SEQ-UEDA
>>>>>> www.juansequeda.com
>>>>>> 
>>>>>> 
>>>>>> On Tue, Apr 24, 2012 at 10:15 PM, Richard Cyganiak <richard@cyganiak.de> wrote:
>>>>>> So, Eric challenged me to present an example of a query over a direct-mapped PK-less table that I believe cannot be evaluated in standard SQL without materializing the entire table outside of the DB.
>>>>>> 
>>>>>> First let me say that I've puzzled over this non-PK issue for more than a day, trying to come up with some scheme based on cursors or ROWNUM or local variables to make it work, and failed. Now, making a leap from “I couldn't do it in a day” to “It's impossible” is certainly not quite appropriate, but after that experience I felt justified to send an implementation experience report to the WG, stating my belief that the cost of implementing this scheme are not worth the benefits. Hence my proposal to let implementers choose whether they want to implement the lean or non-lean direct mapping.
>>>>>> 
>>>>>> So here we go.
>>>>>> 
>>>>>>       IOU
>>>>>> BORROWER | AMOUNT
>>>>>> ---------+-------
>>>>>> Alice    |     10
>>>>>> Bob      |      5
>>>>>> Charlie  |     10
>>>>>> Charlie  |     10
>>>>>> 
>>>>>> The equivalent non-lean direct mapping graph (minus rdf:type triples):
>>>>>> 
>>>>>> _:1 <IOU#BORROWER> "Alice".
>>>>>> _:1 <IOU#AMOUNT> 10.
>>>>>> _:2 <IOU#BORROWER> "Bob".
>>>>>> _:2 <IOU#AMOUNT> 5.
>>>>>> _:3 <IOU#BORROWER> "Charlie".
>>>>>> _:3 <IOU#AMOUNT> 10.
>>>>>> _:4 <IOU#BORROWER> "Charlie".
>>>>>> _:4 <IOU#AMOUNT> 10.
>>>>>> 
>>>>>> Now here's a simple SPARQL query:
>>>>>> 
>>>>>> SELECT * {
>>>>>>   {
>>>>>>      ?x <IOU#BORROWER> "Charlie".
>>>>>>      ?x ?property ?value.
>>>>>>   } UNION {
>>>>>>      ?x <IOU#AMOUNT> 10.
>>>>>>   }
>>>>>> }
>>>>>> 
>>>>>> The solution should be:
>>>>>> 
>>>>>> ?x  | ?property      | ?value
>>>>>> ----+----------------+----------
>>>>>> _:3 | <IOU#BORROWER> | "Charlie"
>>>>>> _:4 | <IOU#BORROWER> | "Charlie"
>>>>>> _:3 | <IOU#AMOUNT>   | 10
>>>>>> _:4 | <IOU#AMOUNT>   | 10
>>>>>> _:1 |                |
>>>>>> _:3 |                |
>>>>>> _:4 |                |
>>>>>> 
>>>>>> Can you outline an algorithm that produces this result without materializing the table? (Ordering, the difference between literals/IRIs/bNodes, and the specific labels for the bNodes don't matter.)
>>>>>> 
>>>>>> Bonus points if the algorithm is expressed as an R2RML mapping. We can assume that we already have an algorithm for evaluating any SPARQL query over an R2RML mapping.
>>>>>> 
>>>>>> Here's my non-standard solution using ROWID, which only works on Oracle:
>>>>>> 
>>>>>> SELECT ROWID x, '<IOU#BORROWER>' property, BORROWER value
>>>>>>      FROM IOU
>>>>>>      WHERE BORROWER='Charlie'
>>>>>> UNION
>>>>>> SELECT ROWID x, '<IOU#AMOUNT>' property, AMOUNT value
>>>>>>      FROM IOU
>>>>>>      WHERE BORROWER='Charlie'
>>>>>> UNION
>>>>>> SELECT ROWID x, NULL, NULL
>>>>>>      FROM IOU
>>>>>>      WHERE AMOUNT=10
>>>>>> 
>>>>>> Earning the R2RML bonus points:
>>>>>> 
>>>>>> <#map> a rr:TriplesMap;
>>>>>>   rr:logicalTable [
>>>>>>      rr:sqlQuery "SELECT ROWID, BORROWER, AMOUNT FROM IOU";
>>>>>>   ];
>>>>>>   rr:subjectMap [
>>>>>>      rr:column "ROWID";
>>>>>>      rr:termType rr:BlankNode
>>>>>>   ];
>>>>>>   rr:predicateObjectMap [
>>>>>>      rr:predicate <IOU#BORROWER>;
>>>>>>      rr:objectMap [ rr:column "BORROWER" ];
>>>>>>   ];
>>>>>>   rr:predicateObjectMap [
>>>>>>      rr:predicate <IOU#AMOUNT>;
>>>>>>      rr:objectMap [ rr:column "AMOUNT" ];
>>>>>>   ].
>>>>>> 
>>>>>> Now, how to do this without the ROWID vendor extension???
>>>>>> 
>>>>>> 
>>>>>> ----
>>>>>> 
>>>>>> For the record. With a lean direct mapping, the desired output graph would be:
>>>>>> 
>>>>>> _:1 <IOU#BORROWER> "Alice".
>>>>>> _:1 <IOU#AMOUNT> 10.
>>>>>> _:2 <IOU#BORROWER> "Bob".
>>>>>> _:2 <IOU#AMOUNT> 5.
>>>>>> _:3 <IOU#BORROWER> "Charlie".
>>>>>> _:3 <IOU#AMOUNT> 10.
>>>>>> 
>>>>>> The query result would be:
>>>>>> 
>>>>>> ?x  | ?property      | ?value
>>>>>> ----+----------------+----------
>>>>>> _:3 | <IOU#BORROWER> | "Charlie"
>>>>>> _:3 | <IOU#AMOUNT>   | 10
>>>>>> _:1 |                |
>>>>>> _:3 |                |
>>>>>> 
>>>>>> The standard-compliant SQL query would be as above, but replace ROWID with something like (BORROWER || '@@@separator@@@' || AMOUNT), and add DISTINCT to each SELECT.
>>>>>> 
>>>>>> The R2RML query would be the same as above with the following changes:
>>>>>> 
>>>>>>   rr:logicalTable [
>>>>>>      rr:tableName "IOU";
>>>>>>   ];
>>>>>>   rr:subjectMap [
>>>>>>      rr:template "{BORROWER}@@@separator@@@{AMOUNT}";
>>>>>>      rr:termType rr:BlankNode;
>>>>>>   ];
>>>>>> 
>>>>>> So, implementing the lean direct mapping is not hard using just standard SQL.
>>>>>> 
>>>>>> Best,
>>>>>> Richard
>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C Semantic Web Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
> 
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
FOAF: http://www.ivan-herman.net/foaf.rdf
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Thursday, 26 April 2012 07:41:24 UTC