Re: Revised SKOS-based translation table proposal from Richard Cyganiak on 2011-12-08 (public-rdb2rdf-wg@w3.org from December 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Thu, 8 Dec 2011 22:31:19 +0000
To: Souripriya Das <souripriya.das@oracle.com>
Cc: <public-rdb2rdf-wg@w3.org>
Message-Id: <6934351B-0015-494E-BCF4-F37DEAF82D1F@cyganiak.de>
Hi Souri,

This kind of “partial map” is a new requirement that was never mentioned before. It's a significant change compared to any previous proposal, including your own previous proposals. I don't think that we should talk about such changes after Last Call. That is orthogonal to the question of how to best express translation tables.

That being said, the example is also poor modelling. The subject IRIs should be created according to a single consistent rule. The “official” IRIs can then be connected to the “local” ones using owl:sameAs, or perhaps more appropriately foaf:homepage. (The website of a city is not the same as the city, and using the identifier for one to identify the other is not a practice I would encourage.)

I have never come across any RDF dataset that uses such “partial mappings” in its identifiers, so I would dispute your assertion that it's a critical requirement.

Partial mappings can be done in R2RML with any of the previous translation table proposals by using two different R2RML views, one for each method of identifier creation. This isn't pretty and it's verbose, but it works, so anyone who really wants to use this modelling can already do so.

As with value-to-literal mappings, the SKOS-based approach could be easily extended to directly cover this, but I don't think that this should be done in R2RML 1.0, and perhaps not at all. The common use case for translation tables is translation from a handful of numeric type codes to a handful of IRIs (often class IRIs, in predicate-object maps with an rdf:type property). If we handle that in R2RML 1.0 then that hits the 80-20 spot.

Best,
Richard


On 8 Dec 2011, at 20:32, Souripriya Das wrote:

> Let us take a concrete example to use for comparing the two R2RML-native proposal and SKOS-based proposal:
> 
> A database table SCOTT.USA_CITIES (CITY, STATE, LATITUDE, LONGITUDE) has about 30000 rows.
> Each row has a unique <CITY, STATE> pair and the LATUTUDE and LONGITUDE info (for the city located in the specified  state).
> 
> We want to map ONLY the following <CITY, STATE> pairs to actual URLs used by their respective city governments:
> - New York, NY => http://www.nyc.gov
> - Boston, MA => http://www.cityofboston.gov
> - Atlanta, GA => http://www.atlantaga.gov
> - Miami, FL => http:www.miamigov.com
> - Dallas, TX => http://www.dallascityhall.com
> - Los Angeles, CA => http://www.lacity.org
> - San Francisco, CA => http://www.sfgov.org
> - Seattle, WA => http://www.seattle.gov
> - Chicago, IL => http://www.cityofchicago.org
> 
> So, using the native-proposal (where we will allow partial map (as used below) - probably a critical requirement in practice), we can express the R2RML map including the translation table as follows:
> 
> x:CityStateTriplesMap
>  rr:logicalTable  [ rr:tableName "\"SCOTT\".\"USA_CITIES\"" ]
>  rr:subjectMap    [ rr:template "http://www.city.{CITY}.{STATE}.us" ; rr:translationScheme x:myTranslationScheme ] ;
> 
>  rr:propertyObjectMap [ rr:predicate cs:city ;        rr:objectMap [ rr:column "CITY" ] ] ;
>  rr:propertyObjectMap [ rr:predicate cs:state ;       rr:objectMap [ rr:column "STATE" ] ] ;
>  rr:propertyObjectMap [ rr:predicate cs:latitude ;    rr:objectMap [ rr:column "LATITUDE" ] ] ;
>  rr:propertyObjectMap [ rr:predicate cs:longitude ;   rr:objectMap [ rr:column "LONGITUDE" ] ]
> .
> 
> x:myTranslationScheme rr:translationMap
>   [ rr:toTerm <http://www.nyc.gov> ;                  rr:fromTerm <http://www.city.New%20York.NY.us> ] ;
>   [ rr:toTerm <http://www.cityofboston.gov> ;         rr:fromTerm <http://www.city.Boston.MA.us> ] ;
>   [ rr:toTerm <http://www.atlantaga.gov> ;            rr:fromTerm <http://www.city.Atlanta.GA.us> ] ;
>   [ rr:toTerm <http:www.miamigov.com> ;               rr:fromTerm <http://www.city.Miami.FL.us> ] ;
>   [ rr:toTerm <http://www.dallascityhall.com> ;       rr:fromTerm <http://www.city.Dallas.TX.us> ] ;
>   [ rr:toTerm <http://www.lacity.org> ;               rr:fromTerm <http://www.city.Los%20Angeles.CA.us> ] ;
>   [ rr:toTerm <http://www.sfgov.org> ;                rr:fromTerm <http://www.city.San%20Francisco.CA.us> ] ;
>   [ rr:toTerm <http://www.seattle.gov> ;              rr:fromTerm <http://www.city.Seattle.WA.us> ] ;
>   [ rr:toTerm <http://www.cityofchicago.org> ;        rr:fromTerm <http://www.city.Chicago.IL.us> ]
> .
> 
> To allow a proper comparison, please express this partial mapping using the SKOS-based approach.
> 
> Thanks,
> - Souri. 
> 
> ----- Original Message -----
> From: richard@cyganiak.de
> To: souripriya.das@oracle.com
> Cc: public-rdb2rdf-wg@w3.org
> Sent: Tuesday, December 6, 2011 12:09:55 PM GMT -05:00 US/Canada Eastern
> Subject: Re: Revised SKOS-based translation table proposal
> 
> Hi Souri,
> 
> On 27 Nov 2011, at 17:10, Souripriya Das wrote:
>> Let us then go back to the second part of the original SKOS-based proposal and compare it with the alternate "R2RML-native" proposal that we had proposed. 
>> 
>> 1) The SKOS-based approach is limited to (many-to-one) Literal-DBterm to IRI-RDFterm translation. This limitation comes from use of the following two properties: (an owl:DatatypeProperty) skos:notation ("used to assign a notation as a typed literal") [7] and (an owl:ObjectProperty) skos:*Match [8]. The R2RML-native scheme, on the other hand, has no such limitation and allows (many-to-one) IRI-or-Literal-DBterm to IRI-or-Literal-RDFterm translation.
> 
> It is correct that the SKOS-based proposal only supports literal-to-IRI mappings. But this is not a limitation.
> 
> Mapping *from* an IRI to something else is nonsensical in the context of R2RML. The values we map from are always SQL data values and hence literals and never IRIs.
> 
> Mapping *to* a literal is potentially useful and I can imagine mapping scenarios where this would be useful. However, the reason why the WG decided to include translation tables in the first place was the following item in our charter:
> 
> [[
> The mapping language MUST allow for a mechanism to create identifiers for database entities.
> ]]
> http://www.w3.org/2009/08/rdb2rdf-charter
> 
> Addressing this charter item requires only mapping to IRIs.
> 
> As I said before, the SKOS-based scheme can be extended to allow mapping to literals if that's what we want, albeit at a cost of one more triple per mapping pair. Mapping "Lo Mein" to <Chinese> would look like this:
> 
>    [] skos:inScheme <scheme1>;
>       skos:notation "Lo Mein";
>       skos:broadMatch <Chinese>.
> 
> Mapping "Lo Mein" to "Chinese" would be:
> 
>    [] skos:inScheme <scheme1>;
>       skos:notation "Lo Mein";
>       skos:broadMatch [ skos:notation "Chinese" ].
> 
> I think that this is a bit of a corner case and I don't think it needs to be supported in R2RML 1.0. I just want to show that the existing design of SKOS already accommodates this; so future R2RML versions could require support for fancier mappings.
> 
>> 2) Also, use of R2RML-native scheme produces less verbose Turtle documents than does use of the SKOS-based scheme (because, unlike the R2RML scheme, the SKOS-based scheme actually requires a translation scheme IRI to be explicitly specified for every translation in that scheme that goes to a different IRI-RDFterm).
> 
> Actually it is not more verbose. The SKOS approach and the custom-vocabulary approach require the *same* number of triples. The example you presented has unnecessary extra rdf:type triples that are not required by the spec. Let's remove them and compare:
> 
> ------ SKOS-based approach ------
> 
> [] skos:inScheme <InternationalCuisnineTranslationScheme>;
>   skos:notation "Lo Mein", "Fu Chi Fei Pian";
>   skos:broadMatch <Chinese>.
> 
> [] skos:inScheme <ChineseCuisineTranslationScheme>;
>   skos:notation "Lo Mein";
>   skos:broadMatch <Chinese>.
> [] skos:inScheme <ChineseCuisineTranslationScheme>;
>   skos:notation "Fu Chi Fei Pian";
>   skos:broadMatch <Sichuan>.
> 
> ------ Custom vocabulary approach ------
> 
> <InternationalCuisineTranslationScheme> rr:translationMap
>   [ rr:toTerm  <Chinese> ;
>     rr:fromTerm "Lo Mein", "Fu Chi Fei Pian" ;
>   ] .
> 
> <ChineseCuisineTranslationScheme> rr:translationMap
>   [ rr:toTerm <Chinese> ;
>     rr:fromTerm "Lo Mein" ;
>   ] ,
>   [ rr:toTerm <Sichuan>
>     rr:fromTerm "Fu Chi Fei Pian" ;
>   ] .
> 
> ---------------------------------
> 
> Ten triples in both cases. We see that the custom-vocabulary approach can be written in less *characters* because the direction of the rr:translationMap property is opposite to skos:inScheme, and hence the comma-based Turtle syntactic sugar can be used.
> 
> I would like to point out here that
> 
> (i) you have argued in the past that syntax doesn't matter and R2RML is all about the model. There's no difference in the model between both approaches.
> 
> (ii) you have in the past expressed strong objections against designs that were supposed to make typical R2RML expressions more compact, so I suppose you agree that other concerns can sometimes overrode the desire for less verbose R2RML expressions.
> 
> As I see it, we have the choice between
> 
> (a) adopting an established and popular international standard designed by the W3C, or
> 
> (b) save some keystrokes by exploiting a syntactic idiosyncrasy of Turtle, and in the process re-invent the wheel.
> 
> As I said earlier:
> 
>>> SKOS is a W3C Recommendation. It is the third-most used vocabulary on the linked data web. It's used by the Library of Congress, the UK government, the European Commission's Publication Office, the United Nation's Food and Agricultural Organization, and the New York Times.
> 
> I learned this week that we can add the Hungarian and German National Libraries and the British Museum to that list.
> 
> Finally, it's worth pointing out OpenLink's position again:
> http://lists.w3.org/Archives/Public/public-rdb2rdf-wg/2011Sep/0018.html
> 
> Are you sure that the advantages of the custom-vocabulary approach (less bytes, easier support for use cases that require mapping to literals) outweigh the advantage of using a standard?
> 
> Thanks,
> Richard
>
Received on Thursday, 8 December 2011 22:31:51 UTC