- From: Juan Sequeda <juanfederico@gmail.com>
- Date: Thu, 3 May 2012 16:45:03 -0500
- To: "Eric Prud'hommeaux" <eric@w3.org>
- Cc: ashok malhotra <ashok.malhotra@oracle.com>, Richard Cyganiak <richard@cyganiak.de>, Michael Hausenblas <michael.hausenblas@deri.org>, Ivan Herman <ivan@w3.org>, W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
- Message-ID: <CAMVTWDxda2_WTGw7q4zZ=4cGELUviYfCDj_yCzneZ3xkJuKzOQ@mail.gmail.com>
Eric, good point. The corner case we have been talking about is a table without a primary key. The Direct Mapping spec states: "The Direct Graph is a formula for creating an RDF graph from the rows of each table and view in a database schema." A view does not necessary have a primary key (actually, I don't know if you can add a primary key to a view, must be vendor-dependent) In this case, then this is not a corner case anymore. Thoughts? Juan Sequeda +1-575-SEQ-UEDA www.juansequeda.com On Thu, May 3, 2012 at 4:36 PM, Eric Prud'hommeaux <eric@w3.org> wrote: > * ashok malhotra <ashok.malhotra@oracle.com> [2012-05-03 12:22-0700] > > +1 for option 2. Seems less onerous. Eric? > > It pains me that folks see me as obstructionist when I may well be > saving us a 3rd LC. In June of 2006, Fred Zemke spotted a similar > problem in the semantics of SPARQL wich took us six months to fix > <http://www.w3.org/mid/4488B936.10705@oracle.com>. > > Speaking with Sam Madden, this seems like less of a corner case than > we originally thought. He and Zemke asserted that while some base > tables may have no uniques, it's more common for views materialized > for performance to preserve only the information required to perform > some aggregates. Before standardization of SQL, some relational DBs > operated on sets, others on multisets, and some (Zemke worked on one > called Britton Lee) preserved repeated rows until one did a > sort. Customers, particularly those using views, had to be very > careful in what order they performed various operations. > > Juan brought up fixing this in v1. It's easy for v1.1 to relax rigid > constraints in v1.0, but most charters promise backward compatibility, > so v1.1 can't impose restrictions not present in v1.0. > > Another issue is the performance of very common queries. Under > multiset semantics, any query which either reports the name of an > unnamed row requires the complex dance that Richard and I discussed. > OTOH, under set semantics, any query which simply restricts or > projects some row attributes requires a distinct subselect, which is > either memory intensive or requires a sort of the table. For example, > a simple join to get the addresses of folks with year-old debts: > > SELECT ?name ?city > WHERE { > ?debt <IOUs#name> ?name ; > <IOUs#date> ?date ; > <IOUs#addr> ?addr . > ?addr <Addresses#city> ?city > FILTER (?date < "2011-05-03"^^xsd:date) > } > > multiset SQL translation: > SELECT name, city > FROM IOUs INNER JOIN Addresses ON IOUs.addr=Addresses.ID > WHERE date < "2011-05-03" > > set SQL translation: > SELECT name, city > FROM ( > SELECT DISTINCT name, date, addr, attr4, attr5 > FROM IOUs > ) IOUs INNER JOIN Addresses ON IOUs.addr=Addresses.ID > WHERE date < "2011-05-03" > > One could make a pretty good case for preserving the intuitive and > efficient query mapping for such common queries. > > > > All the best, Ashok > > > > On 5/3/2012 12:10 PM, Juan Sequeda wrote: > > > > > > > > >On Thu, May 3, 2012 at 2:01 PM, Richard Cyganiak <richard@cyganiak.de<mailto: > richard@cyganiak.de>> wrote: > > > > > > On 3 May 2012, at 17:11, Juan Sequeda wrote: > > > > Do you accept eric's proposal (which hasn't been stated yet): > > > > > > > > 1) Leave DM as-is > > > > 2) Add the following to R2RML > > > > > > > > rr:subjectMap [ > > > > rr:termType rr:RowBlankNode > > > > ]; > > > > > > (I'd prefer calling it rr:BlankNode. The absence of > rr:column/rr:template/rr:constant indicates the new behaviour.) > > > > > > This is a new feature that was never discussed before. It's not > just a tweak. No existing RDB2RDF mapping language has anything comparable. > How to sensibly implement it, is a somewhat open question, AFAIK. Had this > been proposed a few months ago, everyone would have said, “sounds like an > R2RML 1.1 feature” and we would have postponed it without complaints. > > > > > > The problem at hand is the an incompatibility between two specs, > let's call them A and B, in a corner case. Now given these choices: > > > > > > 1) Add a new and somewhat risky feature to spec A, at a time when > we thought we were just about to enter PR. Send all implementers of A back > to the drawing board. Delay the WG for an indefinite amount of time, over a > barely relevant corner case. > > > > > > 2) Relax a constraint in spec B to say you SHOULD implement the > “correct” behaviour for this corner case, but MAY also implement another > not entirely unreasonable behaviour that is compatible with A as it is. Add > some alarming language and say: “We expect future versions of A to remove > this limitation.” No implementation changes. Go to PR in three weeks. > > > > > > To me, 2) makes a lot more sense than 1). > > > > > > > > >I agree with Richard. Option 2 seems more reasonable at the moment. > > > > > >We already have other issues to address for a R2RML and DM 1.1 version. > This could be part of it. I'm not sure how this works in the > standardization process, but as a group, we believe this particular issue > is a corner case so it's not imperative to include it in the current > version of the standard. However, if users complain about this corner case > (we then realize that it isn't a corner case), we realize we were wrong > from the beginning. I'm guessing this sometimes (usually?) happens in > standards, right? > > > > > > > > > Best, > > > Richard > > > > > > > > > > > > > > > > > > > > > Juan Sequeda > > > > +1-575-SEQ-UEDA > > > > www.juansequeda.com <http://www.juansequeda.com> > > > > > > > > > > > > On Thu, May 3, 2012 at 11:08 AM, Michael Hausenblas < > michael.hausenblas@deri.org <mailto:michael.hausenblas@deri.org>> wrote: > > > > > > > > > Were we close to closing R2RML's CR? > > > > > > > > This was the last issue, all other have been resolved in last > weeks meeting (see also my comments when I sent out the minutes [1]). Never > mind, we're not extending CR but entering a second, rather short LC period. > > > > > > > > Ivan, can you prepare a respective PROPOSAL for next week's > meeting please? > > > > > > > > Cheers, > > > > Michael > > > > > > > > [1] > http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2012May/0005.html > > > > > > > > -- > > > > Dr. Michael Hausenblas, Research Fellow > > > > DERI - Digital Enterprise Research Institute > > > > NUIG - National University of Ireland, Galway > > > > Ireland, Europe > > > > Tel.: +353 91 495730 <tel:%2B353%2091%20495730> > > > > WebID: http://sw-app.org/mic.xhtml#i > > > > > > > > On 3 May 2012, at 17:04, Eric Prud'hommeaux wrote: > > > > > > > > > * Juan Sequeda <juanfederico@gmail.com <mailto: > juanfederico@gmail.com>> [2012-05-03 10:50-0500] > > > > >> Looks like we have to extend CR till > > > > >> we have implementations for this corner case. > > > > > > > > > > Were we close to closing R2RML's CR? > > > > > > > > > > > > > > >> Juan Sequeda > > > > >> www.juansequeda.com <http://www.juansequeda.com> > > > > >> > > > > >> On May 3, 2012, at 10:42 AM, Richard Cyganiak < > richard@cyganiak.de <mailto:richard@cyganiak.de>> wrote: > > > > >> > > > > >>> On 3 May 2012, at 16:25, Eric Prud'hommeaux wrote: > > > > >>>> presumes you can create tables, but yeah, conceptually > easier query. > > > > >>> > > > > >>> (It looks like most databases have a proprietary method of > adding the indexes that doesn't require write access to the DB.) > > > > >>> > > > > >>>> you can even push the symbol generation down: > > > > >>> > > > > >>> Right. > > > > >>> > > > > >>>>> The big remaining question is: How to handle this in R2RML? > > > > >>>> > > > > >>>> Looking for an analog to: > > > > >>>> rr:subjectMap [ > > > > >>>> rr:column "ROWID"; > > > > >>>> rr:termType rr:BlankNode > > > > >>>> ]; > > > > >>>> I'd propose: > > > > >>>> rr:subjectMap [ > > > > >>>> rr:termType rr:RowBlankNode > > > > >>>> ]; > > > > >>> > > > > >>> That's an option. Even keeping rr:BlankNode would work — the > absence of an rr:column/rr:template/rr:constant might signal that a fresh > blank node must be allocated for each row. > > > > >>> > > > > >>>> Does that complicate things beyond how much a cardinality > requirement necessarily complicates things? > > > > >>> > > > > >>> Well, the spec only needs to define the graph generated by > the mapping, so in terms of specification it would be a simple enough > change. > > > > >>> > > > > >>> The implications for implementers are quite significant > though. It's a new feature, the implementation costs are not trivial, no > existing implementation does this (AFAIK), so there's a certain amount of > R&D required to show that it's implementable. > > > > >>> > > > > >>> Best, > > > > >>> Richard > > > > > > > > > > -- > > > > > -ericP > > > > > > > > > > > > > > > > > > > > > > > > > -- > -ericP >
Received on Thursday, 3 May 2012 21:45:55 UTC