Re: Revised SKOS-based translation table proposal from Richard Cyganiak on 2011-12-06 (public-rdb2rdf-wg@w3.org from December 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Tue, 6 Dec 2011 17:09:04 +0000
To: Souripriya Das <souripriya.das@oracle.com>
Cc: <public-rdb2rdf-wg@w3.org>
Message-Id: <B7A7AFD1-BA60-4832-9F3A-A6D2E31752DB@cyganiak.de>
Hi Souri,

On 27 Nov 2011, at 17:10, Souripriya Das wrote:
> Let us then go back to the second part of the original SKOS-based proposal and compare it with the alternate "R2RML-native" proposal that we had proposed. 
> 
> 1) The SKOS-based approach is limited to (many-to-one) Literal-DBterm to IRI-RDFterm translation. This limitation comes from use of the following two properties: (an owl:DatatypeProperty) skos:notation ("used to assign a notation as a typed literal") [7] and (an owl:ObjectProperty) skos:*Match [8]. The R2RML-native scheme, on the other hand, has no such limitation and allows (many-to-one) IRI-or-Literal-DBterm to IRI-or-Literal-RDFterm translation.

It is correct that the SKOS-based proposal only supports literal-to-IRI mappings. But this is not a limitation.

Mapping *from* an IRI to something else is nonsensical in the context of R2RML. The values we map from are always SQL data values and hence literals and never IRIs.

Mapping *to* a literal is potentially useful and I can imagine mapping scenarios where this would be useful. However, the reason why the WG decided to include translation tables in the first place was the following item in our charter:

[[
The mapping language MUST allow for a mechanism to create identifiers for database entities.
]]
http://www.w3.org/2009/08/rdb2rdf-charter

Addressing this charter item requires only mapping to IRIs.

As I said before, the SKOS-based scheme can be extended to allow mapping to literals if that's what we want, albeit at a cost of one more triple per mapping pair. Mapping "Lo Mein" to <Chinese> would look like this:

    [] skos:inScheme <scheme1>;
       skos:notation "Lo Mein";
       skos:broadMatch <Chinese>.

Mapping "Lo Mein" to "Chinese" would be:

    [] skos:inScheme <scheme1>;
       skos:notation "Lo Mein";
       skos:broadMatch [ skos:notation "Chinese" ].

I think that this is a bit of a corner case and I don't think it needs to be supported in R2RML 1.0. I just want to show that the existing design of SKOS already accommodates this; so future R2RML versions could require support for fancier mappings.

> 2) Also, use of R2RML-native scheme produces less verbose Turtle documents than does use of the SKOS-based scheme (because, unlike the R2RML scheme, the SKOS-based scheme actually requires a translation scheme IRI to be explicitly specified for every translation in that scheme that goes to a different IRI-RDFterm).

Actually it is not more verbose. The SKOS approach and the custom-vocabulary approach require the *same* number of triples. The example you presented has unnecessary extra rdf:type triples that are not required by the spec. Let's remove them and compare:

------ SKOS-based approach ------

[] skos:inScheme <InternationalCuisnineTranslationScheme>;
   skos:notation "Lo Mein", "Fu Chi Fei Pian";
   skos:broadMatch <Chinese>.

[] skos:inScheme <ChineseCuisineTranslationScheme>;
   skos:notation "Lo Mein";
   skos:broadMatch <Chinese>.
[] skos:inScheme <ChineseCuisineTranslationScheme>;
   skos:notation "Fu Chi Fei Pian";
   skos:broadMatch <Sichuan>.

------ Custom vocabulary approach ------

<InternationalCuisineTranslationScheme> rr:translationMap
   [ rr:toTerm  <Chinese> ;
     rr:fromTerm "Lo Mein", "Fu Chi Fei Pian" ;
   ] .

<ChineseCuisineTranslationScheme> rr:translationMap
   [ rr:toTerm <Chinese> ;
     rr:fromTerm "Lo Mein" ;
   ] ,
   [ rr:toTerm <Sichuan>
     rr:fromTerm "Fu Chi Fei Pian" ;
   ] .

---------------------------------

Ten triples in both cases. We see that the custom-vocabulary approach can be written in less *characters* because the direction of the rr:translationMap property is opposite to skos:inScheme, and hence the comma-based Turtle syntactic sugar can be used.

I would like to point out here that

(i) you have argued in the past that syntax doesn't matter and R2RML is all about the model. There's no difference in the model between both approaches.

(ii) you have in the past expressed strong objections against designs that were supposed to make typical R2RML expressions more compact, so I suppose you agree that other concerns can sometimes overrode the desire for less verbose R2RML expressions.

As I see it, we have the choice between

(a) adopting an established and popular international standard designed by the W3C, or

(b) save some keystrokes by exploiting a syntactic idiosyncrasy of Turtle, and in the process re-invent the wheel.

As I said earlier:

>> SKOS is a W3C Recommendation. It is the third-most used vocabulary on the linked data web. It's used by the Library of Congress, the UK government, the European Commission's Publication Office, the United Nation's Food and Agricultural Organization, and the New York Times.

I learned this week that we can add the Hungarian and German National Libraries and the British Museum to that list.

Finally, it's worth pointing out OpenLink's position again:
http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011Sep/0018.html

Are you sure that the advantages of the custom-vocabulary approach (less bytes, easier support for use cases that require mapping to literals) outweigh the advantage of using a standard?

Thanks,
Richard
Received on Tuesday, 6 December 2011 17:09:35 UTC