Re: Re 2: Brain teaser for non-PK tables

no need to state the effect: lean vs non lean rdf graph

On Thu, Apr 26, 2012 at 12:02 PM, Richard Cyganiak <richard@cyganiak.de>wrote:

> Ivan,
>
> See inline for responses and a PROPOSAL.
>
> On 26 Apr 2012, at 08:43, Ivan Herman wrote:
> >>> [[[
> >>> In general, for duplicate rows with identical values, implementations
> should use fresh blank nodes for each duplicate row. However,  if the
> underlying database system does not provide any means to reliably
> differentiate among the rows via, eg, row ids, it is acceptable to
> implentations to reuse blank nodes.
> >>> ]]]
> >>
> >> I'm ok with that. I would rather remove the mention of ROWIDs, to make
> the hidden translation a bit less obvious (“Oracle should implement it with
> fresh blank nodes; for everyone else, it is acceptable to re-use the same
> blank node for duplicate rows.”)
> >
> > I am fine if you find a suitable technical term there; or simply drop
> the "eg, row ids,"
>
> Let's drop it then.
>
> >>> I wonder wheter we should not add that in such a case a warning should
> also be issued.
> >>
> >> An implementation would either have to always show the warning, or
> never. That's not helpful to anyone. It's also unclear how warnings would
> be delivered and to whom.
> >
> > I am not sure whether warning system is referred to anywhere else in the
> doc. But something with MAY is neutral enough. That being said, this is a
> side issue.
>
> Or we could just say that systems SHOULD document/advertise their choice
> of implementation strategy. Sending warnings at runtime would be one way of
> doing that I suppose ;-)
>
> >> We could specify two different conformance levels or conformance modes
> (lean/non-lean), and make conforming implementations declare explicitly
> which one they support.
> >
> > The original question was whether this would lead to new LC or not. I
> think that if we use the formulation above, it is fine to go ahead to PR.
> Introducing new conformance modes definitely sends back the document to LC.
> I am not sure it is worth it, to be honest.
>
> I agree, not worth it. To put it all together (with minor rewording):
>
> PROPOSAL: In the DM spec, replace the following text:
>
> [[
> If the table has no primary key, the row node is a fresh blank node that
> is unique to this row.
> ]]
>
> with this:
>
> [[
> If the table has no primary key, the row node is a blank node. Distinct
> blank nodes MUST be generated for rows with distinct column values. For
> duplicate rows with identical values, implementations SHOULD generate a
> fresh blank for each duplicate row. However, if the underlying database
> system does not provide any means to reliably differentiate among the rows,
> then implementations MAY re-use the same blank node for multiple duplicate
> rows. Implementations SHOULD document and advertise their chosen behavior.
> ]]
>
> Best,
> Richard
>
>
>
> >
> > Ivan
> >
> >
> >> Best,
> >> Richard
> >>
> >>
> >>
> >>>
> >>> The wording on how to describe the corner case probably needs
> refining, but you get what I mean, I guess.
> >>>
> >>> If that is the only change, I guess it could be argued that such a
> change is reflecting implementation experience, and would not constitute a
> change warranting a second LC.
> >>>
> >>> Ivan
> >>>
> >>> ---
> >>> Ivan Herman
> >>> Tel:+31 641044153
> >>> http://www.ivan-herman.net
> >>>
> >>> (Written on mobile, sorry for brevity and misspellings...)
> >>>
> >>>
> >>>
> >>> On 25 Apr 2012, at 17:08, Ivan Herman <ivan@w3.org> wrote:
> >>>
> >>>> The way I read this, and if my understanding is correct, it clarifies
> a potential ambiguity in the spec. As Michael put it, this is what CR is
> for, and I would not go to another LC for this.
> >>>>
> >>>> Ivan
> >>>>
> >>>> On Apr 25, 2012, at 15:48 , ashok malhotra wrote:
> >>>>
> >>>>> Ivan:
> >>>>> We need your guidance on this
> >>>>>
> >>>>> Re.  Whether this needs another Last Call, the proposal is to replace
> >>>>> [[
> >>>>> If the table has no primary key, the row node is a fresh blank node
> that is unique to this row
> >>>>> ]]
> >>>>> with this wording:
> >>>>> [[
> >>>>> If the table has no primary key, the row node is a blank node.
> Distinct blank nodes must be generated for rows with distinct column
> values. For duplicate rows with identical values, it is left to the
> implementation whether to generate distinct blank nodes for each duplicate
> row.
> >>>>> ]]
> >>>>>
> >>>>> As I see it, this offers the implementation additional freedom in a
> corner case.
> >>>>> Not sure if that constitutes a material change in the semantics.
> >>>>> All the best, Ashok
> >>>>>
> >>>>> On 4/25/2012 6:05 AM, Juan Sequeda wrote:
> >>>>>> You got my vote and Marcelo's. So
> >>>>>>
> >>>>>> +2
> >>>>>>
> >>>>>> My question now is... do we have to go back to last call?
> >>>>>>
> >>>>>> In addition to adding this, we would need to do a minor change in
> the appendix to reflect this change. For the Direct Mapping as Rules
> section, we would just need to change a bit the definition of
> generateRowBlankNode predicate.
> >>>>>>
> >>>>>> For the Denotational semantics, in line 37
> >>>>>>
> >>>>>> [[
> >>>>>> else
> >>>>>> a BlankNode unique to r
> >>>>>> ]]
> >>>>>>
> >>>>>> would need to be changed to reflect the change. Not sure exactly
> how it would be done. Eric?
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Juan Sequeda
> >>>>>> +1-575-SEQ-UEDA
> >>>>>> www.juansequeda.com
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Apr 25, 2012 at 2:52 PM, Richard Cyganiak <
> richard@cyganiak.de> wrote:
> >>>>>> Hi Juan,
> >>>>>>
> >>>>>> This direction works for me. I would reword it slightly. How about
> replacing the current spec text:
> >>>>>>
> >>>>>> [[
> >>>>>> If the table has no primary key, the row node is a fresh blank node
> that is unique to this row
> >>>>>> ]]
> >>>>>>
> >>>>>> with this wording:
> >>>>>>
> >>>>>> [[
> >>>>>> If the table has no primary key, the row node is a blank node.
> Distinct blank nodes must be generated for rows with distinct column
> values. For duplicate rows with identical values, it is left to the
> implementation whether to generate distinct blank nodes for each duplicate
> row.
> >>>>>> ]]
> >>>>>>
> >>>>>> and adding an informative NOTE:
> >>>>>>
> >>>>>> [[
> >>>>>> NOTE: In the case of duplicate rows in tables without primary key,
> if one blank node is generated for each row, then the result is a
> *non-lean* RDF graph [RDF Semantic]. If one blank node is generated for
> each distinct set of column values, then the result is a *lean* RDF graph.
> The lean version is equivalent to the non-lean version under RDF Semantics,
> but does not maintain the relational table's cardinalities, and hence gives
> different answers under certain SPARQL queries. The lean version is easily
> expressible in R2RML [R2RML].
> >>>>>> ]]
> >>>>>>
> >>>>>> I think this is the same in spirit as your version, but says less
> about implementation concerns, and motivates the two versions more in terms
> of compatibility with other specs (SPARQL and R2RML).
> >>>>>>
> >>>>>> Best,
> >>>>>> Richard
> >>>>>>
> >>>>>>
> >>>>>> On 25 Apr 2012, at 09:25, Juan Sequeda wrote:
> >>>>>>> What caught my attention was: "let implementers choose whether
> they want to implement the lean or non-lean direct mapping." I like how you
> phrased that. This would imply that there could be two DM: a lean and
> non-lean.
> >>>>>>>
> >>>>>>> I would propose to change
> >>>>>>>
> >>>>>>> "If the table has no primary key, the row node is a fresh blank
> node that is unique to this row"
> >>>>>>>
> >>>>>>> to
> >>>>>>>
> >>>>>>> "If the table has no primary key, the row node is a blank node. "
> >>>>>>>
> >>>>>>
> >>>>>>> And then have a note/warning.
> >>>>>>>
> >>>>>>
> >>>>>>> [[
> >>>>>>> If you generate a fresh blank node that is unique to this row,
> then the result is a non-lean RDF graph.
> >>>>>>>
> >>>>>>> If you generate the same blank node for repeated tuples, then the
> result is a lean RDF graph.
> >>>>>>>
> >>>>>>> The non-lean DM preserves the cardinality of the tuples, but it
> hard/inefficient to implement in a SPARQL to SQL translator.
> >>>>>>>
> >>>>>>> The lean DM does not preserve the cardinality of the tuples, but
> the implementation is easier/efficient in a SPARQL to SQL translator.
> >>>>>>>
> >>>>>>> If you are implementing a dumping tool, the recommendation is to
> create a non-lean DM in order to maintain the cardinality.
> >>>>>>> ]]
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Juan Sequeda
> >>>>>>> +1-575-SEQ-UEDA
> >>>>>>> www.juansequeda.com
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Apr 24, 2012 at 10:15 PM, Richard Cyganiak <
> richard@cyganiak.de> wrote:
> >>>>>>> So, Eric challenged me to present an example of a query over a
> direct-mapped PK-less table that I believe cannot be evaluated in standard
> SQL without materializing the entire table outside of the DB.
> >>>>>>>
> >>>>>>> First let me say that I've puzzled over this non-PK issue for more
> than a day, trying to come up with some scheme based on cursors or ROWNUM
> or local variables to make it work, and failed. Now, making a leap from “I
> couldn't do it in a day” to “It's impossible” is certainly not quite
> appropriate, but after that experience I felt justified to send an
> implementation experience report to the WG, stating my belief that the cost
> of implementing this scheme are not worth the benefits. Hence my proposal
> to let implementers choose whether they want to implement the lean or
> non-lean direct mapping.
> >>>>>>>
> >>>>>>> So here we go.
> >>>>>>>
> >>>>>>>      IOU
> >>>>>>> BORROWER | AMOUNT
> >>>>>>> ---------+-------
> >>>>>>> Alice    |     10
> >>>>>>> Bob      |      5
> >>>>>>> Charlie  |     10
> >>>>>>> Charlie  |     10
> >>>>>>>
> >>>>>>> The equivalent non-lean direct mapping graph (minus rdf:type
> triples):
> >>>>>>>
> >>>>>>> _:1 <IOU#BORROWER> "Alice".
> >>>>>>> _:1 <IOU#AMOUNT> 10.
> >>>>>>> _:2 <IOU#BORROWER> "Bob".
> >>>>>>> _:2 <IOU#AMOUNT> 5.
> >>>>>>> _:3 <IOU#BORROWER> "Charlie".
> >>>>>>> _:3 <IOU#AMOUNT> 10.
> >>>>>>> _:4 <IOU#BORROWER> "Charlie".
> >>>>>>> _:4 <IOU#AMOUNT> 10.
> >>>>>>>
> >>>>>>> Now here's a simple SPARQL query:
> >>>>>>>
> >>>>>>> SELECT * {
> >>>>>>>  {
> >>>>>>>     ?x <IOU#BORROWER> "Charlie".
> >>>>>>>     ?x ?property ?value.
> >>>>>>>  } UNION {
> >>>>>>>     ?x <IOU#AMOUNT> 10.
> >>>>>>>  }
> >>>>>>> }
> >>>>>>>
> >>>>>>> The solution should be:
> >>>>>>>
> >>>>>>> ?x  | ?property      | ?value
> >>>>>>> ----+----------------+----------
> >>>>>>> _:3 | <IOU#BORROWER> | "Charlie"
> >>>>>>> _:4 | <IOU#BORROWER> | "Charlie"
> >>>>>>> _:3 | <IOU#AMOUNT>   | 10
> >>>>>>> _:4 | <IOU#AMOUNT>   | 10
> >>>>>>> _:1 |                |
> >>>>>>> _:3 |                |
> >>>>>>> _:4 |                |
> >>>>>>>
> >>>>>>> Can you outline an algorithm that produces this result without
> materializing the table? (Ordering, the difference between
> literals/IRIs/bNodes, and the specific labels for the bNodes don't matter.)
> >>>>>>>
> >>>>>>> Bonus points if the algorithm is expressed as an R2RML mapping. We
> can assume that we already have an algorithm for evaluating any SPARQL
> query over an R2RML mapping.
> >>>>>>>
> >>>>>>> Here's my non-standard solution using ROWID, which only works on
> Oracle:
> >>>>>>>
> >>>>>>> SELECT ROWID x, '<IOU#BORROWER>' property, BORROWER value
> >>>>>>>     FROM IOU
> >>>>>>>     WHERE BORROWER='Charlie'
> >>>>>>> UNION
> >>>>>>> SELECT ROWID x, '<IOU#AMOUNT>' property, AMOUNT value
> >>>>>>>     FROM IOU
> >>>>>>>     WHERE BORROWER='Charlie'
> >>>>>>> UNION
> >>>>>>> SELECT ROWID x, NULL, NULL
> >>>>>>>     FROM IOU
> >>>>>>>     WHERE AMOUNT=10
> >>>>>>>
> >>>>>>> Earning the R2RML bonus points:
> >>>>>>>
> >>>>>>> <#map> a rr:TriplesMap;
> >>>>>>>  rr:logicalTable [
> >>>>>>>     rr:sqlQuery "SELECT ROWID, BORROWER, AMOUNT FROM IOU";
> >>>>>>>  ];
> >>>>>>>  rr:subjectMap [
> >>>>>>>     rr:column "ROWID";
> >>>>>>>     rr:termType rr:BlankNode
> >>>>>>>  ];
> >>>>>>>  rr:predicateObjectMap [
> >>>>>>>     rr:predicate <IOU#BORROWER>;
> >>>>>>>     rr:objectMap [ rr:column "BORROWER" ];
> >>>>>>>  ];
> >>>>>>>  rr:predicateObjectMap [
> >>>>>>>     rr:predicate <IOU#AMOUNT>;
> >>>>>>>     rr:objectMap [ rr:column "AMOUNT" ];
> >>>>>>>  ].
> >>>>>>>
> >>>>>>> Now, how to do this without the ROWID vendor extension???
> >>>>>>>
> >>>>>>>
> >>>>>>> ----
> >>>>>>>
> >>>>>>> For the record. With a lean direct mapping, the desired output
> graph would be:
> >>>>>>>
> >>>>>>> _:1 <IOU#BORROWER> "Alice".
> >>>>>>> _:1 <IOU#AMOUNT> 10.
> >>>>>>> _:2 <IOU#BORROWER> "Bob".
> >>>>>>> _:2 <IOU#AMOUNT> 5.
> >>>>>>> _:3 <IOU#BORROWER> "Charlie".
> >>>>>>> _:3 <IOU#AMOUNT> 10.
> >>>>>>>
> >>>>>>> The query result would be:
> >>>>>>>
> >>>>>>> ?x  | ?property      | ?value
> >>>>>>> ----+----------------+----------
> >>>>>>> _:3 | <IOU#BORROWER> | "Charlie"
> >>>>>>> _:3 | <IOU#AMOUNT>   | 10
> >>>>>>> _:1 |                |
> >>>>>>> _:3 |                |
> >>>>>>>
> >>>>>>> The standard-compliant SQL query would be as above, but replace
> ROWID with something like (BORROWER || '@@@separator@@@' || AMOUNT), and
> add DISTINCT to each SELECT.
> >>>>>>>
> >>>>>>> The R2RML query would be the same as above with the following
> changes:
> >>>>>>>
> >>>>>>>  rr:logicalTable [
> >>>>>>>     rr:tableName "IOU";
> >>>>>>>  ];
> >>>>>>>  rr:subjectMap [
> >>>>>>>     rr:template "{BORROWER}@@@separator@@@{AMOUNT}";
> >>>>>>>     rr:termType rr:BlankNode;
> >>>>>>>  ];
> >>>>>>>
> >>>>>>> So, implementing the lean direct mapping is not hard using just
> standard SQL.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Richard
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>> ----
> >>>> Ivan Herman, W3C Semantic Web Activity Lead
> >>>> Home: http://www.w3.org/People/Ivan/
> >>>> mobile: +31-641044153
> >>>> FOAF: http://www.ivan-herman.net/foaf.rdf
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >
> >
> > ----
> > Ivan Herman, W3C Semantic Web Activity Lead
> > Home: http://www.w3.org/People/Ivan/
> > mobile: +31-641044153
> > FOAF: http://www.ivan-herman.net/foaf.rdf
> >
> >
> >
> >
> >
>
>
>

Received on Thursday, 26 April 2012 10:15:25 UTC