Re: R2RML mappings of Non-unique Tables

* David McNeil <dmcneil@revelytix.com> [2012-05-13 09:15-0500]
> On Sat, May 12, 2012 at 11:33 PM, Eric Prud'hommeaux <eric@w3.org> wrote:
> 
> > I wrote up the issue and use cases for R2RML's handling of non-unique
> > tables:
> > <http://www.w3.org/2001/sw/rdb2rdf/wiki/Non-unique_Tables>
> >
> 
> Eric - Thanks for writing this up. It is helpful.
> 
> I have a question for clarification. As a matter of fact, this statement
> does not seem accurate to me:
> 
> "This makes it impossible for one to generate an R2RML mapping to the DM of
> a table with potentially repeated rows."
> 
> An R2RML feature not mentioned in the writeup is the ability to define a
> logical view in an R2RML mapping. Such a view can contain a SQL statement
> (and R2RML has been defined with the idea of using database specific SQL in
> such cases). I believe this capability means that we cannot say it is
> "impossible" for an R2RML mapping to capture the DM of a table with
> repeated rows. We can debate about whether it is practical to write or if
> the runtime costs are too high, etc. but I don't think we can say it is
> impossible.

Good point, this could solve the problem.
Can you think of a way to create a view on:

  ┌┤IOUs├─┬───────┬────────┐
  │ fname │ lname │ amount │
  │ Bob   │ Smith │ 30     │
  │ Sue   │ Jones │ 20     │
  │ Bob   │ Smith │ 30     │

which produces distinct rows like:

  ┌───┬┤IOUsʹ├┬───────┬────────┐
  │ X │ fname │ lname │ amount │
  │ 1 │ Bob   │ Smith │ 30     │
  │ 1 │ Sue   │ Jones │ 20     │
  │ 2 │ Bob   │ Smith │ 30     │

(or any values in X which are different for the two Bob Smith 30
rows)? We could create a table with an auto_increment and populate it
with the old table (which would give us X={1,2,3}, but that implies a
change to the database. We could also join against an on-the-fly table
which contains all of the numbers up the the maximum cardinality:

  SELECT nums.n,IOUs.fname,IOUs.lname,IOUs.amount
    FROM (SELECT fname, lname, amount, count(*) C
            FROM IOUs
           GROUP BY fname, lname, amount) AS IOUs
       INNER JOIN
         (SELECT 1 n UNION
          SELECT 2 n UNION
          SELECT 3 n UNION
          SELECT 4 n) AS nums
       ON nums.n <= IOUs.C;
  ┌───┬───────┬───────┬────────┐
  │ n │ fname │ lname │ amount │
  │---│-------│-------│--------│
  │ 1 │ Bob   │ Smith │     30 │
  │ 1 │ Sue   │ Jones │     20 │
  │ 2 │ Bob   │ Smith │     30 │

but that requires computing a query knowing the max cardinality.

Any idea where to go from here?
-- 
-ericP

Received on Sunday, 13 May 2012 17:20:58 UTC