Re: Revised SKOS-based translation table proposal

We think that there is a "phantom triples" problem (i.e., may generate triples that should not be there) with the SKOS-based scheme for representing many-to-one mapping of DBterms to RDFterms as illustrated by the following example where Translation Scheme B is more discerning than Translation Scheme A:

  <RDFterm1, TranslationSchemeA, DBterm1>
  <RDFterm1, TranslationSchemeA, DBterm2>

  <RDFterm1, TranslationSchemeB, DBterm1>
  <RDFterm2, TranslationSchemeB, DBterm2>

Here is a relational form:

  RDFterm               TranslationScheme (skos:inScheme)                  DBterm (skos:notation)
  ----------------      ------------------------                           -------------------
  RDFterm1              TranslationSchemeA                                 DBterm1
  RDFterm1              TranslationSchemeA                                 DBterm2

  RDFterm1              TranslationSchemeB                                 DBterm1
  RDFterm2              TranslationSchemeB                                 DBterm2

Since the proposed structuring of the <DBterm> to <RDFterm> mapping in the SKOS-based scheme is of the form:

  <RDFterm>
    skos:inScheme <MappingScheme> ;
    skos:notation <DBterm> .

translation of the above table to RDF, using (the non-unique) RDFterm column as subject (as implied by the SKOS-based scheme), generates the following INCORRECT set of RDF triples (using Turtle syntax):

  # generated from table row numbers 1, 2, and 3:
  <RDFterm1>
    skos:inScheme <TranslationSchemeA>, <TranslationSchemeB>;
    skos:notation "DBterm1", "DBterm2" .

  # generated from table row number 4:
  <RDFterm2>
    skos:inScheme <TranslationSchemeB> ;
    skos:notation "DBterm2" .

The above set of triples is INCORRECT because it includes the following NON-EXISTENT translation as a triple:
  <RDFterm1>  <TranslationSchemeB>   "DBterm2" .

---------------
In the alternate proposal using R2RML-native properties and class (extended to allow many-to-one mapping), the set of triples would be as follows (exactly as intended):

  <TranslationSchemeA> rr:translationMap
    [ rr:toTerm  <RDFterm1> ;
      rr:fromTerm "DBterm1", "DBterm2" ;
    ] .

  <TranslationSchemeB> rr:translationMap
    [ rr:toTerm <RDFterm1> ;
      rr:fromTerm "DBterm1" ;
    ] ,
    [ rr:toTerm <RDFterm2>
      rr:fromTerm "DBterm2" ;
    ] .

Here is a more concrete (Chinese food :-)) version of this example: 
---------------------------------------------------------------------------------------------------------
Suppose we would like to translate as follows:
   1) "Lo Mein" translated 
        to <Chinese> using both "InternationalCuisnineTranslationScheme" and "ChineseCuisineTranslationScheme"
 
   2) "Fu Chi Fei Pian" translated 
        to <Chinese> using "InternationalCuisnineTranslationScheme" and
        to <Sichuan> using "ChineseCuisnineTranslationScheme",

----------------------------------------------------------------------------------------------------------
Using the SKOS-based scheme this can be expressed as follows:

  <Chinese>
    skos:inScheme <InternationalCuisnineTranslationScheme> ;
    skos:notation "Lo Mein", "Fu Chi Fei Pian" .

  <Chinese>
    skos:inScheme <ChineseCuisineTranslationScheme> ;
    skos:notation "Lo Mein" .

  <Sichuan>
    skos:inScheme <ChineseCuisineTranslationScheme> ;
    skos:notation "Fu Chi Fei Pian" .

Note that the following triple actually gets repeated if we translate the above Turtle to N-Triples:
  <Chinese> skos:notation "Lo Mein" .

The above Turtle can be compacted to the following equivalent version (after removing the duplicate triple):
  <Chinese>
    skos:inScheme <InternationalCuisnineTranslationScheme>, <ChineseCuisineTranslationScheme> ;
    skos:notation "Lo Mein", "Fu Chi Fei Pian" .

  <Sichuan>
    skos:inScheme <ChineseCuisineTranslationScheme> ;
    skos:notation "Fu Chi Fei Pian" .

The above set of triples is INCORRECT because it includes the following NON-EXISTENT translation as a triple:
  <Chinese>  <ChineseCuisineTranslationScheme>   "Fu Chi Fei Pian" .

-------------------------------------------------------------------------------------------------------------------
Using the alternate proposal, and extending it a bit to allow >1 cardinality for rr:fromTerm (to allow mapping many DBterms to one RDFterm), we can express the above situation as follows (benefits: every translationMap is neatly enclosed within a TranslationScheme boundary AND no triples representing a NON-EXISTENT translation):

  <InternationalCuisnineTranslationScheme> rr:translationMap
    [ rr:toTerm  <Chinese> ;
      rr:fromTerm "Lo Mein", "Fu Chi Fei Pian" ;
    ] .

  <ChineseCuisineTranslationScheme> rr:translationMap
    [ rr:toTerm <Chinese> ;
      rr:fromTerm "Lo Mein" ;
    ] ,
    [ rr:toTerm <Sichuan>
      rr:fromTerm "Fu Chi Fei Pian" ;
    ] .

----------------------------------------------------------------------------------------------------

In summary, overall, the alternate proposal, as extended, now supports:

1) many-to-one mapping from (one or more) DBterms to RDFterm
2) Both RDFterm and DBterm can be any type of RDF term -- that is, IRI or Literal
3) Use of a translation scheme as anchor (for a group of translation maps) gives it an intuitive organization
4) the new rr: terms (rr:translationMap, rr:TranslationMap class, rr:toTerm, rr:fromTerm) are intuitive as well

Given this we still believe that the alternate proposal is more expressive and easy to use.

Thanks,
- Souri and Seema

----- Original Message -----
From: richard@cyganiak.de
To: public-rdb2rdf-wg@w3.org
Sent: Tuesday, November 15, 2011 7:03:34 AM GMT -05:00 US/Canada Eastern
Subject: Revised SKOS-based translation table proposal

Regarding ISSUE-72 “Bring back R2RML lookup tables” [1], here's a new proposal:

   http://www.w3.org/2001/sw/rdb2rdf/drafts/translation-tables-DERI2.html

It is still SKOS-based like the first proposal [2], but drops the possibility of using skos:xxxMatch properties, and retains only the ability to use skos:notation, as suggested by David [3].

This makes it simpler than the Oracle proposal [4] in terms of new properties introduced and total triples needed to express a translation table. Since Souri's and Seema's objection to the original proposal [2] was about its complexity [5], I'm confident that the revised proposal is acceptable to them.

(The new proposal retains the ability to express non-bijective mappings, and is limited to mapping to IRIs. This differs from the Oracle proposal, which can only express bijective mappings, but can also map to literals.)

Ivan noted [6] that re-adding this feature would require a second last call.

Best,
Richard


[1] http://www.w3.org/2001/sw/rdb2rdf/track/issues/72
[2] http://www.w3.org/2001/sw/rdb2rdf/drafts/translation-tables-DERI.html
[3] http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011Aug/0186.html
[4] http://www.w3.org/2001/sw/rdb2rdf/drafts/translation-tables-Oracle.html
[5] http://www.w3.org/2001/sw/rdb2rdf/track/issues/66
[6] http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011Nov/0013.html

Received on Monday, 21 November 2011 15:39:47 UTC