- From: Aldo Gangemi <aldo.gangemi@istc.cnr.it>
- Date: Sun, 27 Nov 2005 02:51:44 +0100
- To: Dan Brickley <danbri@w3.org>
- Cc: Jacco van Ossenbruggen <Jacco.van.Ossenbruggen@cwi.nl>, public-swbp-wg@w3.org
At 14:21 +0000 26-11-2005, Dan Brickley wrote: >Aldo Gangemi wrote: > >> >>Hi Jacco, some comments inside >> >>At 10:03 +0100 26-11-2005, Jacco van Ossenbruggen wrote: >> >>>Review of http://www.cs.vu.nl/~mark/wn/wn-conversion.html >>> >>>I agree with the comments posted previously by Jeremy (see below). >>>In addition, as a reader I was a bit confused about the many open >>>issues. What makes things worse is that the possible solutions to >>>many of the open issues are unsufficiently documented that I, as >>>the reader, can form an opinion about them. >>>Minor remarks: >>>-Section 3, explains the prolog format of >>>s(100003009,1,"living_thing",n,1,1): >>> Please also explain the last three arguments, or state that >>>they are explained in Appendix A >>> >>>-Section 4, do not forget to resolve [WHY DOES WORD NOT HAVE THESE >>>SUBCLASSES?]. >>>-Figure caption "The clas hierarchy of WordNet:", fix typo in >>>class, remove ending colon >>>-You do not use subClassOf a la Brickley. Maybe an example of how >>>to get the same semantics using >>>RDF meta modeling is in place? >> >> >>The same semantics cannot be got. > >Ah, big discussion. What we're doing here is representing Wordnet as >a lexical database. That's fine, worthy and important (and also a >bridge to SKOS, where we describe conceptual entities and terms >associated with them, but don't model natural language so >explicitly). What I did, was build a simple-minded ontology FROM the >structures captured by Wordnet hypernyms. That's fine, Dan. The discussion is not about having one or more ports that can be used successfully, but (IMO) about suggesting some practices that have been motivated, reviewed, and are possibly shared with good arguments. >I think the semantics are in there. The data is bad, scruffy, sure. >But the *meaning* of wordnet "hypernym" as defined does carry a >semantic that can be captured in rdfs:subClassOf. HOWEVER this >doesn't mean that all RDF representations of wordnet should do this: >it is useful, but so is the lexical view. If we take hyponymy as just a hierarchical relationship, there is a formal equivalence in terms of graphs: both hyponymy and rdfs:subClassOf appear to be partial orders (irreflexive, antisymmetric, and transitive binary relations). But the set-theoretic semantics underlying e.g. owl:subClassOf implies a more stringent constraint: in order to have set inclusion between set A and set B, all elements of A must be elements of B. In the case of classes, this is straightforward: e.g. if "Cat" is rdfs:subClassOf "Mammal", the elements of the set representing the extension of the class "Cat" are also elements of the set representing the extension of the class "Mammal". Is this true for hyponymy? Besides a quick and intuitive answer, which especially for WordNet 2.1 could mostly be "yes", we should ask: if synsets are classes, what are the elements of the sets representing the extensions of synsets? E.g. we should commit on an interpretation of the synset "cat, true_cat" such that it holds that for all the entities that can be reasonably called either "cat" or "true cat" and can be decently characterized by the gloss: "feline mammal usually having thick soft fur and being unable to roar", there exists a same set including them. And that same set must be a subset of the set including all the entities that can be reasonably called either "feline" or "felid" and can be decently characterized by the gloss: "any of the various lithe-bodied round-headed fissiped mammals many with retractile claws". An answer to that question goes well beyond WordNet's commitment. As a matter of fact, when domain experts try to reuse WordNet parts, they often experience some disillusion (e.g. in biomedicine, geographical systems, Law, etc.), because WordNet hierarchies sometimes do not reflect the way experts organize their knowledge. That's the reason why we decided (at the very beginning of the TF activity) to split the work on WordNet porting to SW languages from the work on giving a formal semantics to WordNet hierarchies (like in the OntoWordNet project). >A machine-friendly relationship between the two approaches >(wordnet-as-words vs wordnet-noun-hierachies-as-a-model-of-the-word) >would be an interesting addition, btw. Indeed. See Mark's suggestion, and my comments. With due warnings, a simple pipeline between the two approached is desirable. >http://wordnet.princeton.edu/gloss > >[[ >hypernym > The generic term used to designate a whole class of specific >instances. Y is a hypernym of X if X is a (kind of) Y. >hyponym > The specific term used to designate a member of a class. X is a >hyponym of Y if X is a (kind of) Y. >]] > >(hmm thought there was a more subclassy definition somewhere else in >the wordnet docs somewhere.... it does sound more like rdf:type than >rdf:subClassOf here...) yes; that wording is formally confusing > >>subClassOf formally means set inclusion, while "hypernymOf" is only >>a property, which is formally equivalent to the existence of an >>ordered pair across two sets. Moreover, while "set" in the first >>semantics is the extension of the class of individuals named by a >>synset, "set" in the second semantics is the extension of the class >>of all synsets. > >It is possible to do it both ways. If we do it with 'hypernym of' >being a plain property, we are building a representation of the >English language as seen from Wordnet. If we do it with >rdfs:subClassOf, we are building a representation of the *world* as >seen from the parts of English language expressed in Wordnet noun >hierarchies (ie. not touching on verbs, events, etc). Yes, and the issue is similar with verbs, modulo the underdeveloped detail of WordNet verb hierarchies. >>Technically, a mapping could be done between the two semantics, but >>the interpretation of all synsets as classes and of all hypernymOf >>relations as subClassOf is untenable wrt intuition, because many >>synsets refer to individuals, > >...that's a bug in the data, not the metamodel, one might argue. That's correct only if WordNet gives an explicit semantic metamodel, but has some buggy data. On the contrary, originally they simply didn't care about that (and probably until 2.1), because hyponymy for linguists is not usually interpreted on set-theoretic grounds. Consequently, that's not a bug in the data. The 2.1 move to instances reflects a different commitment btw. > >>many hypernymOf relations refer to instanceOf (rd:type), and there >>are other problems. This means that semantic porting needs data >>reengineering, not just schema translation. > >Yes, it wouldn't make a very high quality ontology. But often, RDF >users know which words "make sense", eg. I might use "Cowboy Hat" >but not "Paris" as an RDF class in my data, since it is >(semi-)obvious that the latter isn't a good term to use as a class. >So, my approach has been to expose all of Wordnet (the old 1.6) as >URIs, and people use the ones that work as categories, and ignore >the ones that should never have been classes. Again, pragmatically speaking, your work is fine: if something is used, that's already a validation for it. Don't take this discussion as a criticism against its valuability :) >>Similar problems have been shown for many thesauri in the past and >>in particular in the SKOS work. > >SKOS helps reflect these ambiguous 'broader' structures into RDF, >and therefore - i hope - helps us articulate a roadmap from the >world of thesauri into the world of ontologies... > >>A second draft (if time permits) should treat the semantic porting >>of WordNet. Of course, an example can be added also in the current >>one. >> >>>-The document suggest there has not yet been contact with >>>Princeton about the namespace. Should this not be >>>done before going public? If not, has a meeting with Princeton >>>already been scheduled? >> >> >>The contact has been created months ago, and we have just sent a >>message to Christiane Fellbaum to point her at the material for the >>port, and eventually create the namespace. > >If you could cc: the Working Group list on that stuff, it'd help >with transparency, so everyone in the taskforce (and rest of the >group) know where things are up to. Eg. there's a question of "what >should go at the namespace" which is very relevant both the >SKOS/PORT and Vocab Management taskforces (Alistair's work in >particular...). OK, thanks Aldo >cheers, > >Dan > >>>-How to generate URIs for other languages? Related to >>>resolving:[THIS IGNORES LANGUAGE ISSUE! should we append language >>>indicator?]. Also related: URI vs IRI (How to deal with non-latin1 >>>languages). >>>Do translations use the same Prolog format? Works the converter >>>program also for these translations? >>>-In appendix A, would it make sense to adopt the prolog convention >>>of writing Variables with a starting capital? >>>As a prolog programmer, it took me a while to realize what was a >>>atom, literal or variable/placeholder in the prolog code fragments. >>> >>>Jacco >>> >>>Jeremy Carroll wrote: >>> >>>> >>>> >>>>Reviewed document: >>>>http://www.cs.vu.nl/~mark/wn/wn-conversion.html >>>> >>>> >>>>1. the abstract is not an abstract >>>> >>>>2. abstract/sotd or intro needs to set expectations about target >>>>audience and contribution of this document, and its non-objectives >>>> >>>>i.e. >>>>[[ >>>>The TF should produce guidelines for transforming existing wordnets into >>>>an RDF/OWL representation. Guidelines should describe strategies for >>>>converting wordnets-like structures into an RDF representation, as well >>>>as strategies for re-describing in RDF/OWL the content originally >>>>conveyed in the wordnets. >>>>]] >>>> >>>>3. URI issue could/should be expanded, highlighted somewhat. >>>>Covering: >>>> - do the terms like synset etc need a different URI from the terms in >>>>the wordnet itself (e.g. #bank-1) >>>>- different URIs for different versions? >>>>- hash (one huge file) versus slash (303 response? WebArch issue) >>>> >>>>Jeremy -- Aldo Gangemi Research Scientist Laboratory for Applied Ontology Institute for Cognitive Sciences and Technology National Research Council (ISTC-CNR) Via Nomentana 56, 00161, Roma, Italy Tel: +390644161535 Fax: +390644161513 aldo.gangemi@istc.cnr.it http://www.istc.cnr.it/createhtml.php?nbr=71
Received on Sunday, 27 November 2005 01:51:53 UTC