Re: Brain teaser for non-PK tables from Eric Prud'hommeaux on 2012-05-03 (public-rdb2rdf-wg@w3.org from May 2012)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Thu, 3 May 2012 17:36:57 -0400
To: ashok malhotra <ashok.malhotra@oracle.com>
Cc: Juan Sequeda <juanfederico@gmail.com>, Richard Cyganiak <richard@cyganiak.de>, Michael Hausenblas <michael.hausenblas@deri.org>, Ivan Herman <ivan@w3.org>, W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
Message-ID: <20120503213656.GG24144@w3.org>
* ashok malhotra <ashok.malhotra@oracle.com> [2012-05-03 12:22-0700]
> +1 for option 2.  Seems less onerous.   Eric?

It pains me that folks see me as obstructionist when I may well be
saving us a 3rd LC. In June of 2006, Fred Zemke spotted a similar
problem in the semantics of SPARQL wich took us six months to fix
<http://www.w3.org/mid/4488B936.10705@oracle.com>.

Speaking with Sam Madden, this seems like less of a corner case than
we originally thought. He and Zemke asserted that while some base
tables may have no uniques, it's more common for views materialized
for performance to preserve only the information required to perform
some aggregates. Before standardization of SQL, some relational DBs
operated on sets, others on multisets, and some (Zemke worked on one
called Britton Lee) preserved repeated rows until one did a
sort. Customers, particularly those using views, had to be very
careful in what order they performed various operations.

Juan brought up fixing this in v1. It's easy for v1.1 to relax rigid
constraints in v1.0, but most charters promise backward compatibility,
so v1.1 can't impose restrictions not present in v1.0.

Another issue is the performance of very common queries. Under
multiset semantics, any query which either reports the name of an
unnamed row requires the complex dance that Richard and I discussed.
OTOH, under set semantics, any query which simply restricts or
projects some row attributes requires a distinct subselect, which is
either memory intensive or requires a sort of the table. For example,
a simple join to get the addresses of folks with year-old debts:

  SELECT ?name ?city
   WHERE {
     ?debt <IOUs#name> ?name ;
           <IOUs#date> ?date ;
           <IOUs#addr> ?addr .
     ?addr <Addresses#city> ?city
     FILTER (?date < "2011-05-03"^^xsd:date)
   }

multiset SQL translation:
  SELECT name, city
    FROM IOUs INNER JOIN Addresses ON IOUs.addr=Addresses.ID
   WHERE date < "2011-05-03"

set SQL translation:
  SELECT name, city
    FROM (
      SELECT DISTINCT name, date, addr, attr4, attr5
        FROM IOUs
       ) IOUs INNER JOIN Addresses ON IOUs.addr=Addresses.ID
   WHERE date < "2011-05-03"

One could make a pretty good case for preserving the intuitive and
efficient query mapping for such common queries.


> All the best, Ashok
> 
> On 5/3/2012 12:10 PM, Juan Sequeda wrote:
> >
> >
> >On Thu, May 3, 2012 at 2:01 PM, Richard Cyganiak <richard@cyganiak.de <mailto:richard@cyganiak.de>> wrote:
> >
> >    On 3 May 2012, at 17:11, Juan Sequeda wrote:
> >    > Do you accept eric's proposal (which hasn't been stated yet):
> >    >
> >    > 1) Leave DM as-is
> >    > 2) Add the following to R2RML
> >    >
> >    >  rr:subjectMap [
> >    >     rr:termType rr:RowBlankNode
> >    >   ];
> >
> >    (I'd prefer calling it rr:BlankNode. The absence of rr:column/rr:template/rr:constant indicates the new behaviour.)
> >
> >    This is a new feature that was never discussed before. It's not just a tweak. No existing RDB2RDF mapping language has anything comparable. How to sensibly implement it, is a somewhat open question, AFAIK. Had this been proposed a few months ago, everyone would have said, “sounds like an R2RML 1.1 feature” and we would have postponed it without complaints.
> >
> >    The problem at hand is the an incompatibility between two specs, let's call them A and B, in a corner case. Now given these choices:
> >
> >    1) Add a new and somewhat risky feature to spec A, at a time when we thought we were just about to enter PR. Send all implementers of A back to the drawing board. Delay the WG for an indefinite amount of time, over a barely relevant corner case.
> >
> >    2) Relax a constraint in spec B to say you SHOULD implement the “correct” behaviour for this corner case, but MAY also implement another not entirely unreasonable behaviour that is compatible with A as it is. Add some alarming language and say: “We expect future versions of A to remove this limitation.” No implementation changes. Go to PR in three weeks.
> >
> >    To me, 2) makes a lot more sense than 1).
> >
> >
> >I agree with Richard. Option 2 seems more reasonable at the moment.
> >
> >We already have other issues to address for a R2RML and DM 1.1 version. This could be part of it. I'm not sure how this works in the standardization process, but as a group, we believe this particular issue is a corner case so it's not imperative to include it in the current version of the standard. However, if users complain about this corner case (we then realize that it isn't a corner case), we realize we were wrong from the beginning. I'm guessing this sometimes (usually?) happens in standards, right?
> >
> >
> >    Best,
> >    Richard
> >
> >
> >
> >    >
> >    >
> >    > Juan Sequeda
> >    > +1-575-SEQ-UEDA
> >    > www.juansequeda.com <http://www.juansequeda.com>
> >    >
> >    >
> >    > On Thu, May 3, 2012 at 11:08 AM, Michael Hausenblas <michael.hausenblas@deri.org <mailto:michael.hausenblas@deri.org>> wrote:
> >    >
> >    > > Were we close to closing R2RML's CR?
> >    >
> >    > This was the last issue, all other have been resolved in last weeks meeting (see also my comments when I sent out the minutes [1]). Never mind, we're not extending CR but entering a second, rather short LC period.
> >    >
> >    > Ivan, can you prepare a respective PROPOSAL for next week's meeting please?
> >    >
> >    > Cheers,
> >    >           Michael
> >    >
> >    > [1] http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2012May/0005.html
> >    >
> >    > --
> >    > Dr. Michael Hausenblas, Research Fellow
> >    > DERI - Digital Enterprise Research Institute
> >    > NUIG - National University of Ireland, Galway
> >    > Ireland, Europe
> >    > Tel.: +353 91 495730 <tel:%2B353%2091%20495730>
> >    > WebID: http://sw-app.org/mic.xhtml#i
> >    >
> >    > On 3 May 2012, at 17:04, Eric Prud'hommeaux wrote:
> >    >
> >    > > * Juan Sequeda <juanfederico@gmail.com <mailto:juanfederico@gmail.com>> [2012-05-03 10:50-0500]
> >    > >> Looks like we have to extend CR till
> >    > >> we have implementations for this corner case.
> >    > >
> >    > > Were we close to closing R2RML's CR?
> >    > >
> >    > >
> >    > >> Juan Sequeda
> >    > >> www.juansequeda.com <http://www.juansequeda.com>
> >    > >>
> >    > >> On May 3, 2012, at 10:42 AM, Richard Cyganiak <richard@cyganiak.de <mailto:richard@cyganiak.de>> wrote:
> >    > >>
> >    > >>> On 3 May 2012, at 16:25, Eric Prud'hommeaux wrote:
> >    > >>>> presumes you can create tables, but yeah, conceptually easier query.
> >    > >>>
> >    > >>> (It looks like most databases have a proprietary method of adding the indexes that doesn't require write access to the DB.)
> >    > >>>
> >    > >>>> you can even push the symbol generation down:
> >    > >>>
> >    > >>> Right.
> >    > >>>
> >    > >>>>> The big remaining question is: How to handle this in R2RML?
> >    > >>>>
> >    > >>>> Looking for an analog to:
> >    > >>>> rr:subjectMap [
> >    > >>>>      rr:column "ROWID";
> >    > >>>>      rr:termType rr:BlankNode
> >    > >>>>   ];
> >    > >>>> I'd propose:
> >    > >>>> rr:subjectMap [
> >    > >>>>      rr:termType rr:RowBlankNode
> >    > >>>>   ];
> >    > >>>
> >    > >>> That's an option. Even keeping rr:BlankNode would work — the absence of an rr:column/rr:template/rr:constant might signal that a fresh blank node must be allocated for each row.
> >    > >>>
> >    > >>>> Does that complicate things beyond how much a cardinality requirement necessarily complicates things?
> >    > >>>
> >    > >>> Well, the spec only needs to define the graph generated by the mapping, so in terms of specification it would be a simple enough change.
> >    > >>>
> >    > >>> The implications for implementers are quite significant though. It's a new feature, the implementation costs are not trivial, no existing implementation does this (AFAIK), so there's a certain amount of R&D required to show that it's implementable.
> >    > >>>
> >    > >>> Best,
> >    > >>> Richard
> >    > >
> >    > > --
> >    > > -ericP
> >    > >
> >    >
> >    >
> >    >
> >
> >

-- 
-ericP
Received on Thursday, 3 May 2012 21:37:31 UTC