Re: New section on translation schemes in R2RML spec (ISSUE-61) from Richard Cyganiak on 2011-08-27 (public-rdb2rdf-wg@w3.org from August 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Sat, 27 Aug 2011 16:51:54 +0100
To: David McNeil <dmcneil@revelytix.com>
Cc: W3C RDB2RDF <public-rdb2rdf-wg@w3.org>
Message-Id: <15EED1A7-4D4E-4867-A50D-FE1A5BB174E9@cyganiak.de>
Thanks for the comments! Much appreciated.

On 26 Aug 2011, at 18:42, David McNeil wrote:
> * "achieded" -> "achieved"

Fixed

> * "When a logical table column is mapped to RDF using a translation scheme, then the scheme is searched for concepts whose skos:notation matches the data value.", for clarity change the final "data value" to "column value".

Fixed

> * "... specified using the rr:translationScheme property, whose values MUST be translation schemes. A translation scheme is a set of one or more string-IRI pairs." - I stumbled over this when I read it because it goes from talking about a translationScheme property to a conceptual description of a translation scheme as a "set of pairs". My immediate question is how I make a property point to a set of pairs. Maybe it is implied, but I think it should more clearly state that the translationScheme property refers to a "translation scheme" resource. Or maybe to a "ConceptScheme"?

Changed the sentence to:

[[
A translation scheme is a resource that represents a set of one or more string-IRI pairs.
]]

> * Add something like this to the spec: If multiple matching concepts are found then it is a mapping error.

Good point. Added a check to the algorithm that builds the <string, IRI> pairs. (You can still get multiple pairs with the same string by using the matching properties.)

> * I don't see why "broadMatch" is required when performing a 1:N translation. On my reading of SKOS, "exactMatch" is transitive and so we can just use "exactMatch" for each of the concepts. The way that the R2RML spec defines the semantics of exactMatch, closeMatch, and broadMatch they are all identical.

In R2RML, exactMatch, closeMatch and broadMatch all behave identically, but in SKOS they are different. For example, of the following three triples, arguably only the last one is correct (assuming we are talking about cuisine concepts).

scheme1:indian skos:exactMatch scheme2:asian .
scheme1:indian skos:closeMatch scheme2:asian .
scheme1:indian skos:broadMatch scheme2:asian .

> * Do we need to address the issue of inconsistent SKOS data models defined in the mapping document?

I don't know. What do you think?

> * What potential issues arise from supporting 1:N translations?

The main issue is that applying a term map to a logical table row can now result in *multiple* RDF terms being generated. That was not the case before.

> * I don't know enough about SKOS to know whether we are abusing it or using it correctly for this purpose.

I believe that our use is consistent with the SKOS model, and it is a valid application of SKOS. Still, letting some SKOS experts have a look at this might be a good idea.

> * For the 1:N translation case I think it would be nicer to write the following. Is this considered bad form for SKOS?
> <http://chef.example.com/cuisines/indian> a skos:Concept;
>      skos:inScheme <http://chef.example.com/cuisines>;
>      skos:notation 1;
>      skos:notation 2.

That's a good question. I was under the impression that SKOS actually forbids this, but that's not true:

[[
There are no constraints on the cardinality of the skos:notation property. A concept may have zero, 1 or more notations.

Where a concept has more than 1 notation, these may be from the same or different notation systems. In the case where notations are from different systems, different datatypes may be used to indicate this. It is not common practice to assign more than one notation from the same notation system (i.e., with the same datatype URI).
]]
http://www.w3.org/TR/skos-reference/#L2637

So it's “not common practice”, but possible.

I tweaked the algorithm accordingly:

[[
If s has any skos:notation properties whose values are literals, then for each lexical form l of such a literal: […]
]]

Technically speaking, the last step in the inner loop of the algorithm (the one that takes care of the matching properties) could now be removed without loss of expressivity.

At this point, let me explain why this proposal uses SKOS in the first place. Forget about R2RML for a second and just think about database schemas. Database columns that contain opaque codes, like in our cuisine example, are common and are a persistent difficulty when using existing DB data in a new context. They require careful documentation. SKOS is great for expressing structured documentation for such code lists. One can create a concept for each code, use the code as skos:notation, and add labels and notes. Even better, one can use the SKOS matching properties to relate the concepts of that column to equivalent or related concepts in existing global concept schemes, thereby explicitly connecting the “local” codes in a specific DB to curated enterprise concepts. The fact that SKOS is fairly simple and transparent, and has pretty good tool support (graphical editors, including mapping editors etc), helps too.

One way of using such a concept scheme would be to attach them to Direct Mapping column properties. So, <http://food.example.com/venue_db/VENUE#CUISINE> might be the IRI that identifies a particular column in the database according to the DM. Now we could attach the concept scheme to that IRI using a property (db:codeList or db:schemeDocumentation or whatever), and this would serve as nice structured documentation for the database schema.

Now from here it's just a small step to using such concept schemes in the actual translation from the database to a domain vocabulary.

> * Final questions for the group as a whole, not necessarily directed at you: What alternative approaches to defining translation schemes were considered?

Well, there's the D2RQ approach:
http://www.w3.org/2001/sw/rdb2rdf/wiki/Identifier_re-use#Mapping_tables_in_D2RQ

That one is simple enough and did the job for D2RQ, but having custom vocabulary for expressing string-IRI pairs, and maintaining that by hand inside the mapping file, just feels wrong to me these days. Surely there must be existing vocabulary and tooling for this purpose that can just be re-used.

The CSV option is handy in practice, but it feels like a “nice to have” extra and would be a bit difficult to specify well, given how underdefined the CSV format is.

> What are the tradeoffs of the various options? Are there examples of SKOS being used in this way, if so what can we learn from them?
> 
> My sense is that our remaining time before last call should be focused on finalizing the features that are currently in the specs rather than trying to add a new section such as this.

Is this a comment on the perceived usefulness of the feature, or a comment on the timing of the introduction of the feature?

Best,
Richard


> 
> -David
> 
> [1] http://www.w3.org/2001/sw/rdb2rdf/r2rml/#translation-schemes
Received on Saturday, 27 August 2011 15:52:34 UTC