RE: comment: multilingual labelling from Miles, AJ \(Alistair\) on 2005-10-19 (public-esw-thes@w3.org from October 2005)

From: Miles, AJ \(Alistair\) <A.J.Miles@rl.ac.uk>
Date: Wed, 19 Oct 2005 13:22:12 +0100
To: "Antonio De Marinis" <antonio.de.marinis@eea.eu.int>, <public-esw-thes@w3.org>
Cc: "Stefan Jensen" <Stefan.Jensen@eea.eu.int>, Søren Roug <soren@roug.org>
Message-ID: <677CE4DD24B12C4B9FA138534E29FB1D0ACE00@exchange11.fed.cclrc.ac.uk>
Hi Antonio,
 
Many thanks for this comment, this draws attention to an area where I would like to improve the SKOS Core Guide for the next (3rd) release.  Below are some initial comments, I'd like very much to know what you think ...
 
When doing the initial SWAD-Europe work on multilingual thesauri, we described two modeling patterns: 'multilingual labeling' and 'interlingual mapping'.  See [1] for a full description of these patterns, (beware [1] is old and the examples use deprecated features of SKOS Core, although the principle still holds).
 
'Multilingual labeling' is designed to support the basic scenario where e.g. an information retrieval system is presented for both English and French users, where both English and French users share a similar conceptualisation.  To support this scenario, for each concept the full set of labels and documentation in each language should lead both sets of users to understand the same meaning (or a meaning that is similar enough for the purposes of the application).  This is the minimal requirement.  
 
In this scenario, the needs of the application are met without storing any information about how labels and/or documentation were created (or translated).
 
Also, in this scenario, the labels in French might not correspond to translations of the labels in English.  For example, the preferred label in English could be a translation of one of the alternative labels in French, and the preferred label in French could be a translation of one of the alternative labels in English.  Or, there may only be one label in English, which could be a translation of any of the three labels in French.  All that matters is that the users in the target communities are using the same meanings.  To this end, the labels and documentation should be tailored to those communities.
 
The approach to multilingual labeling I envisage, specifically designed to support the information retrieval scenario, is one where the translator examines a concept labeled as yet only in the source language, forms an understanding of meaning by examining all labels, documentation and semantic relations, then creates a set of labels and documentation for that concept in the target language that is appropriate to the users of that language, such that the same (or sufficiently similar) meaning is understood by both language communities.
 
A concept scheme with multilingual labels may have been created by literal translation of each of the labels and each documentation item, but this is not necessarily the only (or the best) approach.
 
These are arguments against grouping labels into translation groups in the general case, although there is no reason why you couldn't define a custom RDF schema that captures this information, and map it to SKOS Core using some rules. I hope to provide examples of doing this type of work in the near future, because it will be necessary for expressing the relationships between SKOS Core and WordNet, and between SKOS Core and the TMF (Terminology Markup Framework) model.  Interestingly, the model you describe is similar to the TMF approach, and I wonder what the ISO TC37 folks have to say.
 
Regarding your second comment, this is exactly the kind of scenario that SKOS Mapping [2] was intended to support.  SKOS Mapping was designed for the 'interlingual mapping' scenario, where users from different language groups have fundamentally different conceptualisations.  Therefore, distinct concept schemes are created for each language group, with labels only in a single language, and semantic mapping relationships are expressed between the concept schemes.  SKOS Mapping was based on the approach outlined in Doerr's paper 'Semantic Problems of Thesaurus Mapping' [3] which seeks to ground semantic mappings in their consequences for information retrieval.
 
I would like to add a section to the SKOS Core Guide for the 3rd W3C Public Working Draft edition, briefly describing the 'interlingual mapping' approach, and where it is useful/necessary.  The problem is that no work has been done on SKOS Mapping since SWAD-Europe finished in Aug 2004, and SKOS Mapping is a bit messy.  The reason for that is that we've all focused on SKOS Core since then, reasoning that we have to figure out SKOS Core before we can do the same for SKOS Mapping.  I would like to revisit SKOS Mapping, with the help of the ISO TC 37 folks, and expect that we would end up reworking it quite extensively.  Until then, any reference to SKOS Mapping from the SKOS Core Guide has to come with a health warning, indicating the instability of SKOS Mapping.
 
Under the current SKOS Mapping specification, your example from comment 2 would be expressed as:
 
 

<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:skos=" http://www.w3.org/2004/02/skos/core#"
  xmlns:map=" http://www.w3.org/2004/02/skos/map#">
 
  <skos:Concept rdf:about="http://www.example.com/concepts/zh#1">
    <skos:prefLabel xml:lang="zh">ta</skos:prefLabel>
    <map:narrowMatch rdf:resource="http://www.example.com/concepts/en#2"/>
    <map:narrowMatch rdf:resource="http://www.example.com/concepts/en#3"/>
    <map:narrowMatch rdf:resource="http://www.example.com/concepts/en#4"/>
  </skos:Concept>
 
  <skos:Concept rdf:about="http://www.example.com/concepts/en#2">
    <skos:prefLabel xml:lang="en">she</skos:prefLabel>
    <map:broadMatch rdf:resource="http://www.example.com/concepts/zh#1"/>
  </skos:Concept>
 
  <skos:Concept rdf:about="http://www.example.com/concepts/en#3">
    <skos:prefLabel xml:lang="en">he</skos:prefLabel>

    <map:broadMatch rdf:resource="http://www.example.com/concepts/zh#1"/>
  </skos:Concept>
 
  <skos:Concept rdf:about="http://www.example.com/concepts/en#4">
    <skos:prefLabel xml:lang="en">it</skos:prefLabel>

    <map:broadMatch rdf:resource="http://www.example.com/concepts/zh#1"/>
  </skos:Concept>
 
</rdf:RDF>

 

 
I'm going to leave it there for now, again thanks for the comment and tell me what you think of the above.  Should a section be added to the SKOS Core Guide, and if so, what should it say?  Should more work be done on SKOS Mapping, to turn it into a proper specification?  Would it fulfill the needs of your application as it is?  If not, what else do you need?
 
Thanks,

Al.
 
[1] http://www.w3.org/2001/sw/Europe/reports/thes/8.3/
[2] http://www.w3.org/2004/02/skos/mapping/
[3] http://jodi.ecs.soton.ac.uk/Articles/v01/i08/Doerr/
 

-----Original Message-----
From: Antonio De Marinis [mailto:antonio.de.marinis@eea.eu.int]
Sent: 19 October 2005 10:25
To: public-esw-thes@w3.org
Cc: Miles, AJ (Alistair); Stefan Jensen; Søren Roug
Subject: comment: multilingual labelling



Hi,

 

First of all, thanks for the good work in the SKOS area.

 

I have some comments and questions about the use of multilingual labeling in SKOS.

 

Comment 1

Consider the following "abstract example":

 

<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:skos="http://www.w3.org/2004/02/skos/core#">
 
  <skos:Concept rdf:about="http://www.example.com/concepts#1">
    <skos:prefLabel xml:lang="en">label 1</skos:prefLabel>
    <skos:altLabel xml:lang="en">label 2</skos:altLabel>
    <skos:altLabel xml:lang="en">label 3</skos:altLabel>
    <skos:prefLabel xml:lang="fr">label 4</skos:prefLabel>
    <skos:altLabel xml:lang="fr">label 5</skos:altLabel>
    <skos:prefLabel xml:lang="it">label 6</skos:prefLabel>
    <skos:altLabel xml:lang="it">label 7</skos:altLabel>
  </skos:Concept>
 
</rdf:RDF>

 

>From the above SKOS concept we see that there are: 

*	1 preferred label for en, fr and it 

*	2 alternative labels in en 

*	1 alternative label in fr 

*	1 alternative label in it 

 

Now my question is: How do I know if "label 7" is a translation of "label 2" or "label 3"?

In the SKOS we miss relations between multilingual labels in order to know which label is the translation of which other label.

 

Maybe this is outside the scope of SKOS. The approach used in SKOS works fine when we have only 1 preferred label and 1 alternative label and the translations of them. When we have multiple alternative labels in several languages things get more complicated and we loose the semantic relations among the translated terms.

 

Would it not be better to group the translation together like this:

 

<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:skos= http://www.w3.org/2004/02/skos/core# <http://www.w3.org/2004/02/skos/core> 
  xmlns:dc="http://purl.org/dc/elements/1.1/">
 
  <rdf:Description rdf:about=" http://www.example.com/concepts_labels#1">
    <dc:title xml:lang="en">English label 1</dc:title>
    <dc:title xml:lang="fr">French label 1</dc:title>  
  </rdf:Description>
 
  <rdf:Description rdf:about=" http://www.example.com/concepts_labels#2">
    <dc:title xml:lang="en">English label 2</dc:title>
    <dc:title xml:lang="fr">French label 2</dc:title>  
  </rdf:Description>
 
  <rdf:Description rdf:about=" http://www.example.com/concepts_labels#3">
    <dc:title xml:lang="en">English label 3</dc:title>
    <dc:title xml:lang="fr">French label 3</dc:title>  
  </rdf:Description>
 
  <skos:Concept rdf:about="http://www.example.com/concepts#1">
    <skos:prefLabel rdf:resource="http://www.example.com/concepts_labels#1"/>
    <skos:altLabel rdf:resource="http://www.example.com/concepts_labels#2"/>
    <skos:altLabel rdf:resource="http://www.example.com/concepts_labels#3"/>
  </skos:Concept>
 
</rdf:RDF>

 

The above is just an example on how labels in different languages couId be grouped together. It can of course be encoded in other ways. The important is to be able to find the relations between translated labels. 

 

Comment 2

Another issue is when there is not the same concept in several languages: when one concept has one label in one language but in the other language is divided in several sub-concepts with several labels.

 

For example in Chinese there is ONE concept for "he/she/it" as one label (and sign), written in Pinyin (phonetically) as "ta". In English there is no "one label" for this concept, The concept of "third person or thing" exist in English but it doesn't have a common used label in the language. It is expressed in three labels "he", "she" and "it" depending on the context. 

 

Now if I try to encode this knowledge in SKOS with one concept I have difficulties:

 

 

<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:skos="http://www.w3.org/2004/02/skos/core#">
 
  <skos:Concept rdf:about="http://www.example.com/concepts#1">
    <skos:prefLabel xml:lang="zh">ta</skos:prefLabel>
    <skos:prefLabel xml:lang="en">she?he? or it?</skos:prefLabel>
  </skos:Concept>
 
</rdf:RDF>

 

 

 

What should I do in this case, should I create three narrower concepts with English labels "she", "he" and "it", like this:

 

<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:skos="http://www.w3.org/2004/02/skos/core#">
 
  <skos:Concept rdf:about="http://www.example.com/concepts#1">
    <skos:prefLabel xml:lang="zh">ta</skos:prefLabel>
    <skos:narrower rdf:resource="http://www.example.com/concepts#2"/>
    <skos:narrower rdf:resource="http://www.example.com/concepts#3"/>
    <skos:narrower rdf:resource="http://www.example.com/concepts#4"/>
  </skos:Concept>
 
  <skos:Concept rdf:about="http://www.example.com/concepts#2">
    <skos:prefLabel xml:lang="zh">ta</skos:prefLabel>
    <skos:prefLabel xml:lang="en">she</skos:prefLabel>
    <skos:broader rdf:resource="http://www.example.com/concepts#1"/>
  </skos:Concept>
  <skos:Concept rdf:about="http://www.example.com/concepts#3">
    <skos:prefLabel xml:lang="zh">ta</skos:prefLabel>
    <skos:prefLabel xml:lang="en">he</skos:prefLabel>
    <skos:broader rdf:resource="http://www.example.com/concepts#1"/>
  </skos:Concept>
  <skos:Concept rdf:about="http://www.example.com/concepts#4">
    <skos:prefLabel xml:lang="zh">ta</skos:prefLabel>
    <skos:prefLabel xml:lang="en">it</skos:prefLabel>
    <skos:broader rdf:resource="http://www.example.com/concepts#1"/>
  </skos:Concept>
 
</rdf:RDF>

 

It seems the correct way, but we have same preferred label "ta" in several concepts, and it is not recommended according to the RDF working draft: "It is recommended that no two concepts in the same concept scheme be given the same preferred lexical label in any given language." 

What should I do in this case? Any suggestions?

Best regards

Antonio De Marinis

__________________________

Antonio De Marinis

Web development

Multimedia communications

at

European Environment Agency

Kongens Nytorv 6

DK-1050 Copenhagen K

Desk: +45 3336 7236

Cell: +46 739 69 99 39

Skype (Chat/IP tel): demarant

http://www.eea.eu.int <http://www.eea.eu.int/> 

__________________________
Received on Wednesday, 19 October 2005 12:22:23 UTC