Some implementation experience on ISSUE-72 (translation tables)

ISSUE-72 is the one about translation tables:
http://www.w3.org/2001/sw/rdb2rdf/track/issues/72

D2RQ (and its obsolete predecessor D2R Map) have had this feature since 2003, and it is used in a significant percentage of mappings. When we release R2RML support, we want to have feature parity between the D2RQ mapping language and R2RML. So we know that we'll have translation tables when we release R2RML support, in a proprietary extension if we have to.

Ironically, the way translation tables are done in D2RQ and in D2R Map at the moment is by using custom vocabulary, not by using SKOS. So it's actually not that far from the simpler incarnations of Souri's proposal. This is because D2RQ predates SKOS by a couple of years.

In D2R Map, translation tables are a straight 1:1 mapping from literals to URIs.

In D2RQ, they are 1:1 mappings from literals to either URIs or literals.

The 1:1 restriction in both products has turned out to be a painful limitation, so I'm keen on relaxing it.

The option to map to literals in D2RQ was added because the design made it easy to generalise this. It was not motivated by user requests or particular use cases. I don't consider it essential. The choice between mapping to URIs or literals is per-table – so you can't map some database values to URIs and some to literals. Allowing a per-value choice would bring some implementation complexities and we didn't really see the use case.

Neither D2R Map nor D2RQ can map *from* URIs to something else. The use case that Souri put forward – generating URIs from the DB using a URI template, and then overriding only *some* of these URIs with a translation table – isn't supported in D2RQ. I've encountered no evidence that someone needs this, neither in interactions with D2RQ users, nor in reviewing actual published RDF data from other sources.

Also, I think that this can't be the default behaviour of translation tables. If there is no mapping for a given database value, then the database value should be treated like a NULL (or one could perhaps argue that it should be treated as an error). Treating this by using the database value as-is in the RDF output would likely produce hard-to-detect errors when database values are forgotten or misspelled in the table, or added later on in the DB. I think I'd -1 this as the default behaviour. It might be ok as an optional alternate behaviour that can be chosen through some flag, but then again given that I don't see the use case, and that there are additional implementation complexities with this approach, we'd not be very likely to implement this behaviour in D2RQ.

Best,
Richard

Received on Monday, 12 December 2011 19:51:44 UTC