Re: [WN] comments on draft

Aldo Gangemi wrote:

>
> Hi Jacco, some comments inside
>
> At 10:03 +0100 26-11-2005, Jacco van Ossenbruggen wrote:
>
>> Review of http://www.cs.vu.nl/~mark/wn/wn-conversion.html
>>
>> I agree with the comments posted previously by Jeremy (see below).
>> In addition, as a reader I was a bit confused about the many open 
>> issues. What makes things worse is that the possible solutions to 
>> many of the open issues are unsufficiently documented that I, as the 
>> reader, can form an opinion about them.
>> Minor remarks:
>> -Section 3, explains the prolog format of 
>> s(100003009,1,"living_thing",n,1,1):
>>    Please also explain the last three arguments, or state that they 
>> are explained in Appendix A
>>
>> -Section 4, do not forget to resolve [WHY DOES WORD NOT HAVE THESE 
>> SUBCLASSES?].
>> -Figure caption "The clas hierarchy of WordNet:", fix typo in class, 
>> remove ending colon
>> -You do not use subClassOf a la Brickley.  Maybe an example of how to 
>> get the same semantics using
>> RDF meta modeling is in place?
>
>
> The same semantics cannot be got.

Ah, big discussion. What we're doing here is representing Wordnet as a 
lexical database. That's fine, worthy and important (and also a bridge 
to SKOS, where we describe conceptual entities and terms associated with 
them, but don't model natural language so explicitly). What I did, was 
build a simple-minded ontology FROM the structures captured by Wordnet 
hypernyms.

I think the semantics are in there. The data is bad, scruffy, sure. But 
the *meaning* of wordnet "hypernym" as defined does carry a semantic 
that can be captured in rdfs:subClassOf. HOWEVER this doesn't mean that 
all RDF representations of wordnet should do this: it is useful, but so 
is the lexical view. A machine-friendly relationship between the two 
approaches (wordnet-as-words vs 
wordnet-noun-hierachies-as-a-model-of-the-word) would be an interesting 
addition, btw.

http://wordnet.princeton.edu/gloss

[[
hypernym
    The generic term used to designate a whole class of specific 
instances. Y is a hypernym of X if X is a (kind of) Y.
hyponym
    The specific term used to designate a member of a class. X is a 
hyponym of Y if X is a (kind of) Y.
]]

(hmm thought there was a more subclassy definition somewhere else in the 
wordnet docs somewhere.... it does sound more like rdf:type than 
rdf:subClassOf here...)


> subClassOf formally means set inclusion, while "hypernymOf" is only a 
> property, which is formally equivalent to the existence of an ordered 
> pair across two sets. Moreover, while "set" in the first semantics is 
> the extension of the class of individuals named by a synset, "set" in 
> the second semantics is the extension of the class of all synsets.

It is possible to do it both ways. If we do it with 'hypernym of' being 
a plain property, we are building a representation of the English 
language as seen from Wordnet. If we do it with rdfs:subClassOf, we are 
building a representation of the *world* as seen from the parts of 
English language expressed in Wordnet noun hierarchies (ie. not touching 
on verbs, events, etc).

> Technically, a mapping could be done between the two semantics, but 
> the interpretation of all synsets as classes and of all hypernymOf 
> relations as subClassOf is untenable wrt intuition, because many 
> synsets refer to individuals, 

...that's a bug in the data, not the metamodel, one might argue.


> many hypernymOf relations refer to instanceOf (rd:type), and there are 
> other problems. This means that semantic porting needs data 
> reengineering, not just schema translation.

Yes, it wouldn't make a very high quality ontology. But often, RDF users 
know which words "make sense", eg. I might use "Cowboy Hat" but not 
"Paris" as an RDF class in my data, since it is (semi-)obvious that the 
latter isn't a good term to use as a class. So, my approach has been to 
expose all of Wordnet (the old 1.6) as URIs, and people use the ones 
that work as categories, and ignore the ones that should never have been 
classes.

> Similar problems have been shown for many thesauri in the past and in 
> particular in the SKOS work.

SKOS helps reflect these ambiguous 'broader' structures into RDF, and 
therefore - i hope - helps us articulate a roadmap from the world of 
thesauri into the world of ontologies...

> A second draft (if time permits) should treat the semantic porting of 
> WordNet. Of course, an example can be added also in the current one.
>
>> -The document suggest there has not yet been contact with Princeton 
>> about the namespace. Should this not be
>> done before going public?  If not, has a meeting with Princeton 
>> already been scheduled?
>
>
> The contact has been created months ago, and we have just sent a 
> message to Christiane Fellbaum to point her at the material for the 
> port, and eventually create the namespace.

If you could cc: the Working Group list on that stuff, it'd help with 
transparency, so everyone in the taskforce (and rest of the group) know 
where things are up to. Eg. there's a question of "what should go at the 
namespace" which is very relevant both the SKOS/PORT and Vocab 
Management taskforces (Alistair's work in particular...).

cheers,

Dan

>> -How to generate URIs for other languages?  Related to 
>> resolving:[THIS IGNORES LANGUAGE ISSUE! should we append language 
>> indicator?]. Also related: URI vs IRI (How to deal with non-latin1 
>> languages).
>> Do translations use the same Prolog format?  Works the converter 
>> program also for these translations?
>> -In appendix A, would it make sense to adopt the prolog convention of 
>> writing Variables with a starting capital?
>> As a prolog programmer, it took me a while to realize what was a 
>> atom, literal or variable/placeholder in the prolog code fragments.
>>
>> Jacco
>>
>> Jeremy Carroll wrote:
>>
>>>
>>>
>>> Reviewed document:
>>> http://www.cs.vu.nl/~mark/wn/wn-conversion.html
>>>
>>>
>>> 1. the abstract is not an abstract
>>>
>>> 2. abstract/sotd or intro needs to set expectations about target
>>> audience and contribution of this document, and its non-objectives
>>>
>>> i.e.
>>> [[
>>> The TF should produce guidelines for transforming existing wordnets 
>>> into
>>> an RDF/OWL representation. Guidelines should describe strategies for
>>> converting wordnets-like structures into an RDF representation, as well
>>> as strategies for re-describing in RDF/OWL the content originally
>>> conveyed in the wordnets.
>>> ]]
>>>
>>> 3. URI issue could/should be expanded, highlighted somewhat.
>>> Covering:
>>>  - do the terms like synset etc need a different URI from the terms in
>>> the wordnet itself (e.g. #bank-1)
>>> - different URIs for different versions?
>>> - hash (one huge file) versus slash (303 response? WebArch issue)
>>>
>>> Jeremy
>>
>
>

Received on Saturday, 26 November 2005 14:21:01 UTC