ISSUE-79 (ROWID): Concerns about implementability of DM for tables w/o primary key [Direct Mapping] from RDB2RDF Working Group Issue Tracker on 2012-03-06 (public-rdb2rdf-wg@w3.org from March 2012)

From: RDB2RDF Working Group Issue Tracker <sysbot+tracker@w3.org>
Date: Tue, 06 Mar 2012 23:03:20 +0000
To: public-rdb2rdf-wg@w3.org
Message-Id: <E1S53Pc-0006QO-Mc@nelson.w3.org>

ISSUE-79 (ROWID): Concerns about implementability of DM for tables w/o primary key [Direct Mapping]

http://www.w3.org/2001/sw/rdb2rdf/track/issues/79

Raised by: Richard Cyganiak
On product: Direct Mapping

For tables without primary key, the DM requires that a “fresh” blank node be allocated to each row.

This behaviour is encoded in the “IOU” test case:
http://www.w3.org/2001/sw/rdb2rdf/test-cases/#DirectGraphTC0005

Implementing this is easy enough when dumping the direct graph into a file, but apparently impossible when we need to “hold onto” one of the blank nodes and retrieve additional information about it, e.g., in SPARQL-to-SQL rewriters or implementations of RDF APIs such as the Jena API on top of a DM'd database.

The case of concern is when the table has neither a PK nor a Unique Key, and the DB engine doesn't support some sort of internal unique row identifier such as Oracle's ROWID. Note that Core SQL 2008 doesn't require anything like ROWID (as far as I can tell), and many DB engines including MySQL, PostgreSQL, SQL Server and HSQLDB don't have a suitable equivalent.

(Most if not all of these databases have some equivalent to Oracle's ROWNUM, which looks somewhat promising for implementing this, but I cannot work out any way to actually do it.)

One immediate effect of this is that the DM cannot be implemented in R2RML.

I'm tempted to argue that the requirement for a “fresh” blank node should be relaxed, and implementations that assign the same blank node to identically-valued rows should be considered conforming too. This would require a change to the DirectGraphTC0005 test case mentioned above – using the same blank node _:a instead of _:a and _:c should be acceptable.

(I wish all databases had ROWID!)

Received on Tuesday, 6 March 2012 23:03:22 UTC