DM implementation feedback: implementability for tables w/o primary key from Richard Cyganiak on 2012-04-22 (public-rdb2rdf-wg@w3.org from April 2012)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Sun, 22 Apr 2012 23:37:14 +0100
To: W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
Message-Id: <07AC62A3-2CAE-4363-BB05-A480AD8A8579@cyganiak.de>

For tables without primary key, the DM requires that a “fresh” blank node be allocated to each row.

This behaviour is encoded in the “IOU” test case:
http://www.w3.org/2001/sw/rdb2rdf/test-cases/#DirectGraphTC0005

Implementing this is easy enough when dumping the direct graph into a file, but I believe it is impossible when we need to “hold onto” one of the blank nodes and retrieve additional information about it, e.g., in SPARQL-to-SQL rewriters or implementations of RDF APIs such as the Jena API on top of a DM'd database.

The case of concern is when the table has neither a PK nor a Unique Key, and the DB engine doesn't support some sort of internal unique row identifier such as Oracle's ROWID. Note that Core SQL 2008 doesn't require anything like ROWID (as far as I can tell), and many DB engines including MySQL, PostgreSQL, SQL Server and HSQLDB don't have a suitable equivalent.

(Most if not all of these databases have some equivalent to Oracle's ROWNUM, which looks somewhat promising for implementing this, but I cannot work out any way to actually do it.)

One immediate corollary of the specified behaviour is that the DM cannot be implemented in R2RML.

I propose that the requirement for a “fresh” blank node should be relaxed, and implementations that assign the same blank node to identically-valued rows should be considered conforming too. This would require a change to the DirectGraphTC0005 test case mentioned above – using the same blank node _:a instead of _:a and _:c should be acceptable.

(I wish all databases had ROWID!)

Best,
Richard

Received on Sunday, 22 April 2012 22:37:48 UTC