Re: WordNet Task Force - work outline from Philippe Martin on 2004-04-16 (public-swbp-wg@w3.org from June 2004)

From: Philippe Martin <phmartin@meganesia.sci.griffith.edu.au>
Date: Fri, 16 Apr 2004 09:34:46 -0400 (EDT)
To: public-swbp-wg@w3.org
Cc: phmartin@meganesia.int.gu.edu.au
Message-Id: <200404161332.i3GDWlj00834@meganesia.int.gu.edu.au>
Aldo, Jeremy,


> 2) Clarification of lexical relations, ...
> 3) How to refine wordnets' hierarchies? ...
> an OWL ontology to talk about the main wordnet relationships

Here are some kinds of "links" (relations between categories,
i.e. between types or indivisuals) that I had to introduce when
converting the noun-related part of WordNet 1.7 into a genuine
lexical ontology usable for knowledge-based management purposes
(details at http://www.webkb.org/doc/wn/):
- to replace the hyponym links: subtypeOf and instanceOf;
- to replace the antonym links: simpleExclusionOf and
  closedExclusionOf (like simpleExclusionOf but in addition,
  the linked categories form a "closed subtype partition" for
  another type, e.g. wn#chromatic_color and wn#achromatic_color
  form a closed subtype partition for wn#color);
  the complementOf and inverseOf links are not currently useful
  within WordNet but are common (e.g. they exist in OWL);
- to replace certain erroneous uses of the hyponym link: the 
  locationOf link (e.g. wn#Dunkerque is said to be hyponym of both
  wn#city and wn#amphibious_assault; this violates some exclusionOf
  link(s) in my top-level ontology; I have set wn#Dunkerque as 
  instanceOf wn#city and locationOf wn#amphibious_assault).

Other kinds of links are useful to extend WordNet:
instrumentOf, inputOf, objectOf, agentOf, resultOf, etc.




> > > But also look at the file at http://xmlns.com/wordnet/1.6/City:
> > > City is a class introduced with all its taxonomic branch (poor 
> > > practice: if each class is introduced with all its superclasses,
> > > the ontology results unnecessary long) then all hyponyms of "City"
> > > are introduced, ...
> > If all you want to know about is City then the City download is a 
> > good one, if you want to know about more than that, maybe you need
> > the full download (wherever that is). ...
> Got the point, it is an efficiency issue. Is there any tool to 
> convert an OWL (or RDF) ontology into a set of "views files", possibly
> based on customizable properties (e.g., "give me only a superclass 
> specification view", or "give me subclasses and related classes
> specification view"?

> I think this is a very promising area, and could perhaps be 
> generalized into a system that was given the URI of a word or
> document or a Literal (i.e. a node or small graph) and returned
> a graph containing related nodes.
> (Jena + Lucene might provide a lot of of what's required)

Indeed, storing a "view file" for each WordNet category and each
exploration/presentation option of its relation to other categories
is not practical. A knowledge server is needed. To illustrate some 
options, I'll use my own system (WebKB-2) as an example.

http://www.webkb.org/bin/categSearch.cgi?categ=%23city&format=RDF

Value for the "categ" parameter: either a category identifier such as
"#city" (GET encoding: %23city;  "wn#city" and "#city" are equivalent
in WebKB-2: WordNet is the default source/creator) or a word such as
"city" which may have more than one recorded meaning (in which case,
more than one "category and associated relations" is presented).

Value for the "format" parameter: there should be more than one
possible format because (i) RDF/XML is inadequate for vizualizing or 
navigating a large semantic network, (ii) RDF/XML+OWL is inadequate
for many other purposes (natural language representation, interlingua,
development and representation of reasonably complex ontologies, etc.).
WebKB-2 currently proposes four other formats for presenting links
between categories and navigating them. For example, try:
http://www.webkb.org/bin/categSearch.cgi?categ=city&recursLink=%3C&hyperlinks
("%3C" is the GET encoding for '<', i.e. the subtypeOf link; thus,
"recursLink=%3C" asks for the display of the whole subtypeOf hierarchy,
unless other presentation constraints limit that exploration, e.g. an
exploration depth). See the "category search" interface for more
options: http://www.webkb.org/interface/categSearch.html



> For example, anyone creating an OWL or RDFS class might wish to 
> annotate it with its intended meaning using *the* URI for a specific
> sense of an  English word, as classified by wordnet. The main
> requirement from this use case is agreement over what that URI is,
> including the beginning bit (the namespace) and the end bit (the
> mapping from Wordnet's representation of senses) ...

There will be more than one possible URI for a category even if only one
knowledge server is exploited:
- if URLs such as http://www.webkb.org/bin/categSearch.cgi?categ=%23city
  are used, many options may be added (hence leading to different URLs);
  if artificial URIs such as http://www.webkb.org/wn#city are used (this
  URI does not refer to any existing document), the problem for
  knowledge processing tools is then to find the URL of the knowledge
  server to get more information;
- a category may have many identifiers, e.g. in WebKB-2 the following
  are equivalent: #city, wn#city, #city__metropolis__urban_center;
- a category may be linked to other categories by equivalentTo/equalTo
  links, e.g. wn#city = pm#city = some_french_ontology#ville; since
  these categories are said to be equivalent/identical, asking for links
  connected to one of them should lead to the same result as asking for
  links connected to any other one of them. This is not a problem 
  within a knowledge server (e.g. WebKB-2) but if more than one 
  knowledge server is used, there must be some regular 
  mirroring/replication/propagation processes between the servers.


> 7) Wordnets' modularization: how to organize the global wordnet 
> namespace into domains of interest (directory-like subject trees) or
> in formal theories (ontology modules). What semantics should be used?
> 8) Enrichment to domains, possibly in an open-source programme: 
> wordnets usually feature a poor support for specific domains, being
> general purpose structures. How to expand them through open-source ...

Indeed, most knowledge-based uses of WordNet will lead to additions (or 
sometimes corrections) and these extensions should be sharable.

Independently developed extensions stored in static Web documents are 
difficult to merge (manually, and even more automatically) in a 
semantically/logically/ontologically correct way and hence
hard to re-use for genuine knowledge based management purposes.

In my opinion, a more scalable solution (for knowledge based management
purposes and "Semantic Web" purposes) is the use of knowledge servers
where each server
- associates each user's addition (e.g. a category, a link between 
  categories, a statement) with its source/creator (ontology/user),
  removes or warns about introduced redundancies, and 
  rejects the addition if it introduces an inconsistency (with the 
  protocols used in WebKB-2, this does not prevent a user to express
  her own beliefs but this lets her know about a conflict with other
  users' statements and leads her to be more precise/explicit to 
  eliminate the problem; no discussion or agreement between the users
  is necessary);
- periodically checks other servers (similar general servers, or servers
  specialized in the same domains) to complement its knowledge base
  (thus, it does not matter much which servers people use, and this
   is how the centralized and distributed approaches combine their
   advantages; of course, integrating knowledge from other servers
   may not be obvious but it is *much* easier than trying to
   re-use/integrate dozens/hundreds/thousands of poorly inter-connected
   extensions).
More details on this are accessible from 
http://www.webkb.org/doc/papers/wi02/

I do not think that people should have to explore a hierarchy of 
ontologies (modules) or "domains of interest" to decide which 
modules to search, re-use, extend or merge, especially since
these modules may be conflicting, redundant and complementary.
Imagine someone wanting to search or add some knowledge about certain
kinds of "neurons", or "cats" or "feet". There are (too) many domains
this knowledge can be categorized into and (too) many tasks it can be
useful for. People should be able to use a general knowledge server to
find the different meanings of (categories for) "neurons", "cats" or
"feet", and explore their specializations (or other related categories)
until they have found the right category (or the closest).
Ideally, the knowledge statements (or the "data" for a database related
analogy) associated to a category should also be organized via 
categories, e.g. a category #Siamese_cat might be linked to a
category pm#health_issues_of_siamese_cats.
If the URL of a specialized knowledge server is associated to a 
sufficiently adequate category (e.g. via a link "serverFor"), the user
should continue its exploration on this specialized server (if there are
several possible specialized servers, ideally they mirror each other to
share the same knowledge about this category, i.e. the same collection
of statements related to that category). When the creators of a 
specialized server register to have their server associated to a 
category in a general server, they commit to try to collect and 
structure as much knowledge as possible about that category (e.g.
about pm#health_issues_of_siamese_cats, or more ambitiously, about
#Siamese_cat; the server about #Siamese_cat may of course refer to the
server about pm#health_issues_of_siamese_cats). (In a commercial
environment, it is in the interest of the creators of a server to do
some checking and refer only to the most comprehensive of more 
specialized servers, hence easing users' information retrieval).
In addition to permit search by navigation, a server may also permit
querying and hence may exploit more specialized servers for that.

I should also note that since in a knowledge server the source/creator
of each piece of knowledge is stored, knowledge modules can be
re-generated. Actually, arbitrary complex queries on the knowledge
(its content plus its creators and their characteristics) can be used
to generate knowledge modules. However, it is easier and better to make
additions to the knowledge base of a knowledge server by creating a
static module (a Web document), submitting it to the server and refining
it until its content is stable and accepted by the server (and commited
into its knowledge base) than trying to do incremental on-line additions
and removals (an analogy of that would be trying to develop a large 
shell script via the shell command line, i.e. without text editor).
Furthermore, the links to those static modules may serve various
purposes: documentation of the knowledge, regeneration of the knowledge
base in case it gets corrupted, etc. Thus, modules (distributed static 
files) are useful but their content should be developped and exploited
via knowledge servers (i.e. some means of centralization) to permit
knowledge sharing/re-use.


Philippe

______________________________________________________________________
Dr. Philippe Martin 
Address: Griffith Uni, School of I.T., PMB 50 GCMC, QLD 9726 Australia
Email: phmartin@gu.edu.au;  Fax: +61 7 5552 8066
______________________________________________________________________
Received on Friday, 25 June 2004 10:22:29 UTC