Re: Brain teaser for non-PK tables from Juan Sequeda on 2012-05-03 (public-rdb2rdf-wg@w3.org from May 2012)

From: Juan Sequeda <juanfederico@gmail.com>
Date: Thu, 3 May 2012 16:45:03 -0500
To: "Eric Prud'hommeaux" <eric@w3.org>
Cc: ashok malhotra <ashok.malhotra@oracle.com>, Richard Cyganiak <richard@cyganiak.de>, Michael Hausenblas <michael.hausenblas@deri.org>, Ivan Herman <ivan@w3.org>, W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
Message-ID: <CAMVTWDxda2_WTGw7q4zZ=4cGELUviYfCDj_yCzneZ3xkJuKzOQ@mail.gmail.com>
Eric,

good point. The corner case we have been talking about is a table without a
primary key. The Direct Mapping spec states:

"The Direct Graph is a formula for creating an RDF graph from the rows of
each table and view in a database schema."

A view does not necessary have a primary key (actually, I don't know if you
can add a primary key to a view, must be vendor-dependent)

In this case, then this is not a corner case anymore.

Thoughts?



Juan Sequeda
+1-575-SEQ-UEDA
www.juansequeda.com


On Thu, May 3, 2012 at 4:36 PM, Eric Prud'hommeaux <eric@w3.org> wrote:

> * ashok malhotra <ashok.malhotra@oracle.com> [2012-05-03 12:22-0700]
> > +1 for option 2.  Seems less onerous.   Eric?
>
> It pains me that folks see me as obstructionist when I may well be
> saving us a 3rd LC. In June of 2006, Fred Zemke spotted a similar
> problem in the semantics of SPARQL wich took us six months to fix
> <http://www.w3.org/mid/4488B936.10705@oracle.com>.
>
> Speaking with Sam Madden, this seems like less of a corner case than
> we originally thought. He and Zemke asserted that while some base
> tables may have no uniques, it's more common for views materialized
> for performance to preserve only the information required to perform
> some aggregates. Before standardization of SQL, some relational DBs
> operated on sets, others on multisets, and some (Zemke worked on one
> called Britton Lee) preserved repeated rows until one did a
> sort. Customers, particularly those using views, had to be very
> careful in what order they performed various operations.
>
> Juan brought up fixing this in v1. It's easy for v1.1 to relax rigid
> constraints in v1.0, but most charters promise backward compatibility,
> so v1.1 can't impose restrictions not present in v1.0.
>
> Another issue is the performance of very common queries. Under
> multiset semantics, any query which either reports the name of an
> unnamed row requires the complex dance that Richard and I discussed.
> OTOH, under set semantics, any query which simply restricts or
> projects some row attributes requires a distinct subselect, which is
> either memory intensive or requires a sort of the table. For example,
> a simple join to get the addresses of folks with year-old debts:
>
>  SELECT ?name ?city
>   WHERE {
>     ?debt <IOUs#name> ?name ;
>           <IOUs#date> ?date ;
>           <IOUs#addr> ?addr .
>     ?addr <Addresses#city> ?city
>     FILTER (?date < "2011-05-03"^^xsd:date)
>   }
>
> multiset SQL translation:
>  SELECT name, city
>    FROM IOUs INNER JOIN Addresses ON IOUs.addr=Addresses.ID
>   WHERE date < "2011-05-03"
>
> set SQL translation:
>  SELECT name, city
>    FROM (
>      SELECT DISTINCT name, date, addr, attr4, attr5
>        FROM IOUs
>       ) IOUs INNER JOIN Addresses ON IOUs.addr=Addresses.ID
>   WHERE date < "2011-05-03"
>
> One could make a pretty good case for preserving the intuitive and
> efficient query mapping for such common queries.
>
>
> > All the best, Ashok
> >
> > On 5/3/2012 12:10 PM, Juan Sequeda wrote:
> > >
> > >
> > >On Thu, May 3, 2012 at 2:01 PM, Richard Cyganiak <richard@cyganiak.de<mailto:
> richard@cyganiak.de>> wrote:
> > >
> > >    On 3 May 2012, at 17:11, Juan Sequeda wrote:
> > >    > Do you accept eric's proposal (which hasn't been stated yet):
> > >    >
> > >    > 1) Leave DM as-is
> > >    > 2) Add the following to R2RML
> > >    >
> > >    >  rr:subjectMap [
> > >    >     rr:termType rr:RowBlankNode
> > >    >   ];
> > >
> > >    (I'd prefer calling it rr:BlankNode. The absence of
> rr:column/rr:template/rr:constant indicates the new behaviour.)
> > >
> > >    This is a new feature that was never discussed before. It's not
> just a tweak. No existing RDB2RDF mapping language has anything comparable.
> How to sensibly implement it, is a somewhat open question, AFAIK. Had this
> been proposed a few months ago, everyone would have said, “sounds like an
> R2RML 1.1 feature” and we would have postponed it without complaints.
> > >
> > >    The problem at hand is the an incompatibility between two specs,
> let's call them A and B, in a corner case. Now given these choices:
> > >
> > >    1) Add a new and somewhat risky feature to spec A, at a time when
> we thought we were just about to enter PR. Send all implementers of A back
> to the drawing board. Delay the WG for an indefinite amount of time, over a
> barely relevant corner case.
> > >
> > >    2) Relax a constraint in spec B to say you SHOULD implement the
> “correct” behaviour for this corner case, but MAY also implement another
> not entirely unreasonable behaviour that is compatible with A as it is. Add
> some alarming language and say: “We expect future versions of A to remove
> this limitation.” No implementation changes. Go to PR in three weeks.
> > >
> > >    To me, 2) makes a lot more sense than 1).
> > >
> > >
> > >I agree with Richard. Option 2 seems more reasonable at the moment.
> > >
> > >We already have other issues to address for a R2RML and DM 1.1 version.
> This could be part of it. I'm not sure how this works in the
> standardization process, but as a group, we believe this particular issue
> is a corner case so it's not imperative to include it in the current
> version of the standard. However, if users complain about this corner case
> (we then realize that it isn't a corner case), we realize we were wrong
> from the beginning. I'm guessing this sometimes (usually?) happens in
> standards, right?
> > >
> > >
> > >    Best,
> > >    Richard
> > >
> > >
> > >
> > >    >
> > >    >
> > >    > Juan Sequeda
> > >    > +1-575-SEQ-UEDA
> > >    > www.juansequeda.com <http://www.juansequeda.com>
> > >    >
> > >    >
> > >    > On Thu, May 3, 2012 at 11:08 AM, Michael Hausenblas <
> michael.hausenblas@deri.org <mailto:michael.hausenblas@deri.org>> wrote:
> > >    >
> > >    > > Were we close to closing R2RML's CR?
> > >    >
> > >    > This was the last issue, all other have been resolved in last
> weeks meeting (see also my comments when I sent out the minutes [1]). Never
> mind, we're not extending CR but entering a second, rather short LC period.
> > >    >
> > >    > Ivan, can you prepare a respective PROPOSAL for next week's
> meeting please?
> > >    >
> > >    > Cheers,
> > >    >           Michael
> > >    >
> > >    > [1]
> http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2012May/0005.html
> > >    >
> > >    > --
> > >    > Dr. Michael Hausenblas, Research Fellow
> > >    > DERI - Digital Enterprise Research Institute
> > >    > NUIG - National University of Ireland, Galway
> > >    > Ireland, Europe
> > >    > Tel.: +353 91 495730 <tel:%2B353%2091%20495730>
> > >    > WebID: http://sw-app.org/mic.xhtml#i
> > >    >
> > >    > On 3 May 2012, at 17:04, Eric Prud'hommeaux wrote:
> > >    >
> > >    > > * Juan Sequeda <juanfederico@gmail.com <mailto:
> juanfederico@gmail.com>> [2012-05-03 10:50-0500]
> > >    > >> Looks like we have to extend CR till
> > >    > >> we have implementations for this corner case.
> > >    > >
> > >    > > Were we close to closing R2RML's CR?
> > >    > >
> > >    > >
> > >    > >> Juan Sequeda
> > >    > >> www.juansequeda.com <http://www.juansequeda.com>
> > >    > >>
> > >    > >> On May 3, 2012, at 10:42 AM, Richard Cyganiak <
> richard@cyganiak.de <mailto:richard@cyganiak.de>> wrote:
> > >    > >>
> > >    > >>> On 3 May 2012, at 16:25, Eric Prud'hommeaux wrote:
> > >    > >>>> presumes you can create tables, but yeah, conceptually
> easier query.
> > >    > >>>
> > >    > >>> (It looks like most databases have a proprietary method of
> adding the indexes that doesn't require write access to the DB.)
> > >    > >>>
> > >    > >>>> you can even push the symbol generation down:
> > >    > >>>
> > >    > >>> Right.
> > >    > >>>
> > >    > >>>>> The big remaining question is: How to handle this in R2RML?
> > >    > >>>>
> > >    > >>>> Looking for an analog to:
> > >    > >>>> rr:subjectMap [
> > >    > >>>>      rr:column "ROWID";
> > >    > >>>>      rr:termType rr:BlankNode
> > >    > >>>>   ];
> > >    > >>>> I'd propose:
> > >    > >>>> rr:subjectMap [
> > >    > >>>>      rr:termType rr:RowBlankNode
> > >    > >>>>   ];
> > >    > >>>
> > >    > >>> That's an option. Even keeping rr:BlankNode would work — the
> absence of an rr:column/rr:template/rr:constant might signal that a fresh
> blank node must be allocated for each row.
> > >    > >>>
> > >    > >>>> Does that complicate things beyond how much a cardinality
> requirement necessarily complicates things?
> > >    > >>>
> > >    > >>> Well, the spec only needs to define the graph generated by
> the mapping, so in terms of specification it would be a simple enough
> change.
> > >    > >>>
> > >    > >>> The implications for implementers are quite significant
> though. It's a new feature, the implementation costs are not trivial, no
> existing implementation does this (AFAIK), so there's a certain amount of
> R&D required to show that it's implementable.
> > >    > >>>
> > >    > >>> Best,
> > >    > >>> Richard
> > >    > >
> > >    > > --
> > >    > > -ericP
> > >    > >
> > >    >
> > >    >
> > >    >
> > >
> > >
>
> --
> -ericP
>
Received on Thursday, 3 May 2012 21:45:55 UTC