Re: Appendix H: Internationalization from Gerard de Melo on 2012-05-15 (public-swbp-wg@w3.org from May 2012)

From: Gerard de Melo <demelo@icsi.berkeley.edu>
Date: Mon, 14 May 2012 19:40:32 -0700
To: Chris Welty <cawelty@gmail.com>
CC: Alexandre Rademaker <arademaker@gmail.com>, public-swbp-wg@w3.org, Valeria de Paiva <valeria.depaiva@gmail.com>
Message-ID: <4FB1C220.20500@icsi.berkeley.edu>
Dear Chris,

Thanks for bringing up these interesting examples. I am one of the people
working with Alexandre.

I think the cases you describe actually point to the more general problems
we face whenever we attempt to discretize meaning into nicely manageable
and identifiable entries. These same issues appear in the form of 
lumping vs.
splitting decisions when compiling monolingual dictionaries or when
defining resources on the Semantic Web (is my definition of Italy really the
same as the one used in your dataset?).

There is no simple answer to these problems. Some differences will be
considered irrelevant, and others will be considered important. Maroon,
magenta, fuchsia, and so on indeed all have separate WordNet entries, and
WordNet does distinguish several different senses of "know". For
cross-linguistic work, indeed new resources should be defined when
differences are deemed important.

I agree that RDF often pushes us towards binary yes/no choices.
Weights or probabilities might serve as a (very crude) approximation of
the gradability involved in such decisions.

Best regards,
Gerard

>
> Alexandre,
>
> One criticism of Wordnet synsets is that there is a binary 
> classification that must happen, each word must either be a member of 
> a synset or not.  In reality, there is really a sort of degree to 
> which a word may belong to a synset, and this may be useful to capture 
> especially when translating.
>
> One example is "to know" in English and "savoir" vs. "connaitre" in 
> french.  In basic French, we learn that Savoir is to know something, 
> and connaitre is to know a person.  We were taught that what in 
> english seems to be a single sense in french is two senses.
>
> If English Wordnet had been constructed without knowledge of this 
> distinction, there would be only one sense of "to know", which would 
> then be translatable to two synsets in french, you would need to 
> understand in this mapping that it is incomplete.
>
> In gets more complicated when you realize that what we learned in 
> basic french is not completely true, while we use the word "know" in 
> English for knowing people, the best translation from french for 
> "connaitre" is "to be familiar with".  Indeed, French uses the word 
> that way - you can reconnais a place, a store, etc., it turns out to 
> be something of a historical artifact that (american) English uses "to 
> know" in this case more commonly.  But "familiar" do not belong to 
> this (English) synset as strongly as "know" - it belongs, and would be 
> understood, but based on the frequency of usage it would sound a 
> little archaic and formal to use "familiar" instead of "know" for a 
> person.
>
> So, the point is, how can you capture this fact that subtleties of 
> language can create partial mappings between them.
>
> This is often easier to explain when you use something that has a 
> scientific understanding as a range of values, like colors.  Take the 
> english word "maroon", which is a color that lies somewhere on the 
> spectrum between red and purple.  Would you lump this into the synset 
> for red, or for purple?   Where do you draw the line in that synset, 
> at a particular point in the spectrum?  What if you found that 
> different languages and cultures draw their boundaries differently, 
> like maybe Italians "see" red as a darker color that Germans, and the 
> mapping of "maroon" into these languages is partial.
>
> Does that make 'sense' ;) ?
>
> -Chris
>
> On 5/10/2012 4:57 PM, Alexandre Rademaker wrote:
>> I am about to finish the translation of our OpenWordNet-PT to RDF
>> integrating it with the original Princeton WordNet 3.0.
>>
>> In appendix H of http://www.w3.org/TR/wordnet-rdf/:
>>
>> "... Integration of WordNets implies creating mappings between
>> entities in the WordNets to indicate lexico-semantic relationships
>> between them, e.g. a property that signifies that the meanings of two
>> Synsets overlap. The entities that represent language concepts that
>> should be able to map are instances of the classes: Synset, WordSense
>> and Word..."
>>
>> I can easily see the utility of an relation between Synsets and
>> WordSenses like "hasTranslation". But I can't see any use of relate
>> the words... Any idea?
>>
>> Best,
>>
>> Alexandre Rademaker
>> http://arademaker.github.com/
>>
>>
>>


-- 
Gerard de Melo [demelo@icsi.berkeley.edu]
http://www.icsi.berkeley.edu/~demelo/
Received on Tuesday, 15 May 2012 02:42:19 UTC