Re: [WN] comments on draft

At 14:21 +0000 26-11-2005, Dan Brickley wrote:
>Aldo Gangemi wrote:
>
>>
>>Hi Jacco, some comments inside
>>
>>At 10:03 +0100 26-11-2005, Jacco van Ossenbruggen wrote:
>>
>>>Review of http://www.cs.vu.nl/~mark/wn/wn-conversion.html
>>>
>>>I agree with the comments posted previously by Jeremy (see below).
>>>In addition, as a reader I was a bit confused about the many open 
>>>issues. What makes things worse is that the possible solutions to 
>>>many of the open issues are unsufficiently documented that I, as 
>>>the reader, can form an opinion about them.
>>>Minor remarks:
>>>-Section 3, explains the prolog format of 
>>>s(100003009,1,"living_thing",n,1,1):
>>>    Please also explain the last three arguments, or state that 
>>>they are explained in Appendix A
>>>
>>>-Section 4, do not forget to resolve [WHY DOES WORD NOT HAVE THESE 
>>>SUBCLASSES?].
>>>-Figure caption "The clas hierarchy of WordNet:", fix typo in 
>>>class, remove ending colon
>>>-You do not use subClassOf a la Brickley.  Maybe an example of how 
>>>to get the same semantics using
>>>RDF meta modeling is in place?
>>
>>
>>The same semantics cannot be got.
>
>Ah, big discussion. What we're doing here is representing Wordnet as 
>a lexical database. That's fine, worthy and important (and also a 
>bridge to SKOS, where we describe conceptual entities and terms 
>associated with them, but don't model natural language so 
>explicitly). What I did, was build a simple-minded ontology FROM the 
>structures captured by Wordnet hypernyms.

That's fine, Dan. The discussion is not about having one or more 
ports that can be used successfully, but (IMO) about suggesting some 
practices that have been motivated, reviewed, and are possibly shared 
with good arguments.

>I think the semantics are in there. The data is bad, scruffy, sure. 
>But the *meaning* of wordnet "hypernym" as defined does carry a 
>semantic that can be captured in rdfs:subClassOf. HOWEVER this 
>doesn't mean that all RDF representations of wordnet should do this: 
>it is useful, but so is the lexical view.

If we take hyponymy as just a hierarchical relationship, there is a 
formal equivalence in terms of graphs: both hyponymy and 
rdfs:subClassOf appear to be partial orders (irreflexive, 
antisymmetric, and transitive binary relations).

But the set-theoretic semantics underlying e.g. owl:subClassOf 
implies a more stringent constraint: in order to have set inclusion 
between set A and set B, all elements of A must be elements of B. In 
the case of classes, this is straightforward: e.g. if "Cat" is 
rdfs:subClassOf "Mammal", the elements of the set representing the 
extension of the class "Cat" are also elements of the set 
representing the extension of the class "Mammal".

Is this true for hyponymy? Besides a quick and intuitive answer, 
which especially for WordNet 2.1 could mostly be "yes", we should 
ask: if synsets are classes, what are the elements of the sets 
representing the extensions of synsets? E.g. we should commit on an 
interpretation of the synset "cat, true_cat" such that it holds that 
for all the entities that can be reasonably called either "cat" or 
"true cat" and can be decently characterized by the gloss: "feline 
mammal usually having thick soft fur and being unable to roar", there 
exists a same set including them. And that same set must be a subset 
of the set including all the entities that can be reasonably called 
either "feline" or "felid" and can be decently characterized by the 
gloss: "any of the various lithe-bodied round-headed fissiped mammals 
many with retractile claws".

An answer to that question goes well beyond WordNet's commitment. As 
a matter of fact, when domain experts try to reuse WordNet parts, 
they often experience some disillusion (e.g. in biomedicine, 
geographical systems, Law, etc.), because WordNet hierarchies 
sometimes do not reflect the way experts organize their knowledge. 
That's the reason why we decided (at the very beginning of the TF 
activity) to split the work on WordNet porting to SW languages from 
the work on giving a formal semantics to WordNet hierarchies (like in 
the OntoWordNet project).

>A machine-friendly relationship between the two approaches 
>(wordnet-as-words vs wordnet-noun-hierachies-as-a-model-of-the-word) 
>would be an interesting addition, btw.

Indeed. See Mark's suggestion, and my comments. With due warnings, a 
simple pipeline between the two approached is desirable.

>http://wordnet.princeton.edu/gloss
>
>[[
>hypernym
>    The generic term used to designate a whole class of specific 
>instances. Y is a hypernym of X if X is a (kind of) Y.
>hyponym
>    The specific term used to designate a member of a class. X is a 
>hyponym of Y if X is a (kind of) Y.
>]]
>
>(hmm thought there was a more subclassy definition somewhere else in 
>the wordnet docs somewhere.... it does sound more like rdf:type than 
>rdf:subClassOf here...)

yes; that wording is formally confusing

>
>>subClassOf formally means set inclusion, while "hypernymOf" is only 
>>a property, which is formally equivalent to the existence of an 
>>ordered pair across two sets. Moreover, while "set" in the first 
>>semantics is the extension of the class of individuals named by a 
>>synset, "set" in the second semantics is the extension of the class 
>>of all synsets.
>
>It is possible to do it both ways. If we do it with 'hypernym of' 
>being a plain property, we are building a representation of the 
>English language as seen from Wordnet. If we do it with 
>rdfs:subClassOf, we are building a representation of the *world* as 
>seen from the parts of English language expressed in Wordnet noun 
>hierarchies (ie. not touching on verbs, events, etc).

Yes, and the issue is similar with verbs, modulo the underdeveloped 
detail of WordNet verb hierarchies.

>>Technically, a mapping could be done between the two semantics, but 
>>the interpretation of all synsets as classes and of all hypernymOf 
>>relations as subClassOf is untenable wrt intuition, because many 
>>synsets refer to individuals,
>
>...that's a bug in the data, not the metamodel, one might argue.

That's correct only if WordNet gives an explicit semantic metamodel, 
but has some buggy data.
On the contrary, originally they simply didn't care about that (and 
probably until 2.1), because hyponymy for linguists is not usually 
interpreted on set-theoretic grounds. Consequently, that's not a bug 
in the data. The 2.1 move to instances reflects a different 
commitment btw.

>
>>many hypernymOf relations refer to instanceOf (rd:type), and there 
>>are other problems. This means that semantic porting needs data 
>>reengineering, not just schema translation.
>
>Yes, it wouldn't make a very high quality ontology. But often, RDF 
>users know which words "make sense", eg. I might use "Cowboy Hat" 
>but not "Paris" as an RDF class in my data, since it is 
>(semi-)obvious that the latter isn't a good term to use as a class. 
>So, my approach has been to expose all of Wordnet (the old 1.6) as 
>URIs, and people use the ones that work as categories, and ignore 
>the ones that should never have been classes.

Again, pragmatically speaking, your work is fine: if something is 
used, that's already a validation for it. Don't take this discussion 
as a criticism against its valuability :)

>>Similar problems have been shown for many thesauri in the past and 
>>in particular in the SKOS work.
>
>SKOS helps reflect these ambiguous 'broader' structures into RDF, 
>and therefore - i hope - helps us articulate a roadmap from the 
>world of thesauri into the world of ontologies...
>
>>A second draft (if time permits) should treat the semantic porting 
>>of WordNet. Of course, an example can be added also in the current 
>>one.
>>
>>>-The document suggest there has not yet been contact with 
>>>Princeton about the namespace. Should this not be
>>>done before going public?  If not, has a meeting with Princeton 
>>>already been scheduled?
>>
>>
>>The contact has been created months ago, and we have just sent a 
>>message to Christiane Fellbaum to point her at the material for the 
>>port, and eventually create the namespace.
>
>If you could cc: the Working Group list on that stuff, it'd help 
>with transparency, so everyone in the taskforce (and rest of the 
>group) know where things are up to. Eg. there's a question of "what 
>should go at the namespace" which is very relevant both the 
>SKOS/PORT and Vocab Management taskforces (Alistair's work in 
>particular...).

OK, thanks
Aldo

>cheers,
>
>Dan
>
>>>-How to generate URIs for other languages?  Related to 
>>>resolving:[THIS IGNORES LANGUAGE ISSUE! should we append language 
>>>indicator?]. Also related: URI vs IRI (How to deal with non-latin1 
>>>languages).
>>>Do translations use the same Prolog format?  Works the converter 
>>>program also for these translations?
>>>-In appendix A, would it make sense to adopt the prolog convention 
>>>of writing Variables with a starting capital?
>>>As a prolog programmer, it took me a while to realize what was a 
>>>atom, literal or variable/placeholder in the prolog code fragments.
>>>
>>>Jacco
>>>
>>>Jeremy Carroll wrote:
>>>
>>>>
>>>>
>>>>Reviewed document:
>>>>http://www.cs.vu.nl/~mark/wn/wn-conversion.html
>>>>
>>>>
>>>>1. the abstract is not an abstract
>>>>
>>>>2. abstract/sotd or intro needs to set expectations about target
>>>>audience and contribution of this document, and its non-objectives
>>>>
>>>>i.e.
>>>>[[
>>>>The TF should produce guidelines for transforming existing wordnets into
>>>>an RDF/OWL representation. Guidelines should describe strategies for
>>>>converting wordnets-like structures into an RDF representation, as well
>>>>as strategies for re-describing in RDF/OWL the content originally
>>>>conveyed in the wordnets.
>>>>]]
>>>>
>>>>3. URI issue could/should be expanded, highlighted somewhat.
>>>>Covering:
>>>>  - do the terms like synset etc need a different URI from the terms in
>>>>the wordnet itself (e.g. #bank-1)
>>>>- different URIs for different versions?
>>>>- hash (one huge file) versus slash (303 response? WebArch issue)
>>>>
>>>>Jeremy


-- 



Aldo Gangemi
Research Scientist
Laboratory for Applied Ontology
Institute for Cognitive Sciences and Technology
National Research Council (ISTC-CNR)
Via Nomentana 56, 00161, Roma, Italy
Tel: +390644161535
Fax: +390644161513
aldo.gangemi@istc.cnr.it
http://www.istc.cnr.it/createhtml.php?nbr=71

Received on Sunday, 27 November 2005 01:51:53 UTC