Porting Wordnets to the Semantic Web

W3C Working Draft 8 July 2004

This version:: ...
Latest version:: ...
Previous versions:: This is the first public version
Editors:: Aldo Gangemi, National Research Council, Italy; Brian McBride
Co-Editors:: Jeremy Carroll; Guus Schreiber

Abstract

Wordnets are valuable resources both as lexical repositories and as sources of ontological distinctions. This documents presents a framework and workplan for porting wordnets to Semantic Web languages, like RDFS and OWL. Some phases are distinguished, and preliminary resources are referenced.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document will be a part of a larger document that will provide an introduction and overview on the deployment of wordnets over the Semantic Web, produced by the Semantic Web Best Practices and Deployment Working Group.

This document is a W3C Working Draft and is expected to change.

This document is the First Public Working Draft. We encourage public comments. Please send comments to public-swbp-wg@w3.org

Open issues, todo items:

A complete review of existing SW-compliant ports of wordnets
A related document that proposes an agreed RDFS and OWL datamodel for Princeton WordNet
A compared list of existing techniques to reengineer wordnets as ontologies

Publication as a draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or made obsolete by other documents at any time. It is inappropriate to cite this document as other than work in progress.

General issue

Wordnets are databases of lexical data, usually including information on hypernyms, synonyms, polysemous terms, relations between terms, and sometimes multilingual equivalents. For example, the Princeton WordNet is described as follows:

"WordNet® is an online lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets."

Within the Semantic Web Best Practices and Deployment Working Group a WNET Task Force has been established for agreeing upon a Semantic Web version of Princeton WordNet, and to suggest a common datamodel and some best practices to port wordnets to the Semantic Web. The TF description can be found here.

Method

The first goal of WNET is to agree upon a schema (initially in RDFS, then in OWL) to translate Princeton WordNet. A preliminary RDFS of Princeton WordNet datamodel is here, and a revised one is here. A preliminary datamodel in OWL that elaborates on the ones developed by the knOWLer project and by Decker & Melnik's RDF representation is here.

Although this first goal does not aim at an ontological interpretation of WordNet (that is the second goal of WNET), it already presents us some choices, which are partly common to other attempts to port database schemas (e.g., this project report) or thesauri (cf. the THES TF in this WG) to Semantic Web languages.
In practice, decisions must be taken if the original datamodel elements should be translated either as classes or properties, or if they could be ignored in the translation. Further decisions concern the cardinality of possible restrictions on the use of properties between the translated classes. Examples of these decisions are contained in the abovementioned schemas.
Once agreed on the basic RDFS datamodel of WordNet, an OWL version can be produced that includes also restrictions.

The OWL (and RDFS) versions will be submitted to the interested parties mentioned in the WNET description, and a revision cycle will be started.
A crucial point consists in getting a namespace from Princeton's developers, and trying to include the porting in the next official distribution.

Based on the datamodel, a porting of the original database (or part of it) to an RDF model will be performed. It is suggested to start directly from WordNet version 2.0, but possible conflicts with needs of implementors using preliminary versions (e.g. 1.7 in the knOWLer porting) should be considered. For each version of WordNet, specific mappings are usually provided by different researchers: these resources can be exploited if needed.

Once this first goal has been reached, suggestions and existing work done on reengineering (not translating) WordNet as an ontology will be collected, and a cycle of drafts and revision will be started within the Semantic Web Best Practices and Deployment Working Group and the external users. An extended example on how to use knowledge engineering techniques to reengineer wordnets and thesauri as Semantic Web ontologies can be found here.