Re: [WNET, PORT, OEP] Synset's, Classes and multiple languages

McBride, Brian wrote:

> Thinking a little more about this as I cycled to work this morning, it
> occurred to me that one test to apply to a design for a wordnet mapping is
> mergability with other WordNet's, and other ontologies, which leads to a
> specific test case (I love test cases).
> 
> What happens if we merged say a French and English WordNet?

This is a really interesting problem space. One can easily get 
distracted by grand issues such as the relationship between thought and 
language...

Eurowordnet had a good crack at the practicalities, see 
http://www.illc.uva.nl/EuroWordNet/ for docs (sadly not data) explaining 
  their twist on the Princeton wordnet approach. They had to change it 
slightly to give them a model for representing multi-lingual wordnets 
(at least for Euro languages, not sure if they claim universality).

> If we have the synset "dog" sameClassAs the class of dogs, and the synset
> "chien" sameClassAs the class of dogs, then we have synset "dog" sameClassAs
> synset "chien".  My  knowledge of Owl is too weak :(  Does that lead to the
> conclusion that "chien" and "dog" are members of the same synset?  Would
> that be a problem for anyone?

http://www.w3.org/TR/2004/REC-owl-ref-20040210/ reminds me that 
daml:sameClassAs became owl:equivalentClass ie 
http://www.w3.org/TR/2004/REC-owl-ref-20040210/#equivalentClass-def

   The meaning of such a class axiom is that the two class descriptions
   involved have the same class extension (i.e., both class extensions
   contain exactly the same set of individuals).

Remembering that OWL and RDF/S allow for there to be distinct classes 
that have a common extension, this allows us to have classes "Dog" and 
"Chien" which are co-extensive, yet nevertheless distinct (including 
their rdfs:label, rdfs:comment, URI names, etc.).

(I think in DAML+OIL, the claim might have been stronger, ie. that the 
two class descriptions were descriptions of the self-same class.)

FWIW I also see value in creating distinct RDF/OWL classes for each term 
in a wordnet synset, since there are plentiful subtle differences 
between the terms that Wordnet lumps together. For eg see 
http://rdfweb.org/topic/WhyWordnetIsCool for a few examples (I applied 
Wordnet to some photos I took one weekend). Wordnet has "cowboy hat, 
ten-gallon hat" as synonyms. Also "can, tin, tin can". Wordnet is pretty 
scruffy. By breaking out synset parts into their own classes, in 
addition to having a common superclass they all share, we allow people 
to use wordnet in a more precise way, without having to spent the 
time/money cleaning it up into a formal ontology.

> If they are synonyms, we'd need to encode the language somewhere, so that
> software can easily restrict the language of the synoyms it selects.

This is the tricky problem of trying to a map a linguistically oriented 
system into a world-oriented system. I think we need parallel 
representations of wordnets: a (simplified) class hierarchy (ie. 
ignoring many parts of wordnet), and a full wordnet-style web of 
(linguistic) concepts. The former would benefit for exposing RDF/OWL 
classes for each synset, and each term in the synset; the latter might 
be handled with an extension of SKOS, although that is yet to be 
established.

> I guess I'm wondering if it would make sense to model the synsets for the
> same concept in different languages as different resources. 

I think so, at some level. At another level, we want to express the 
commonality.

Motivating use case: Image description... many electronic images are 
language-neutral and hence internationally useful in a way that 
textually-oriented electronic documents aren't. Being able to have an 
image be described by a french speaker, yet discovered by a Japanese 
speaker (or vice-versa), is a nicely practical goal for the semantic web 
effort.

cheers,

Dan

ps. http://www.w3.org/2001/sw/Europe/200404/i18n/jptofu-example1.xml for 
some rough experiments with Japanese dictionary ideas...

Received on Thursday, 6 May 2004 04:57:42 UTC