Wordnets are valuable resources both as lexical repositories and as
sources of ontological distinctions. This documents presents a
framework and workplan for porting wordnets to Semantic Web languages,
like RDFS and OWL. Some phases are distinguished, and preliminary
resources are referenced.
Status of this Document
This section describes the status of this document at the time
of its publication. Other documents may supersede this document. A list
of current W3C publications and the latest revision of this technical
report can be found in the W3C
technical reports index at http://www.w3.org/TR/.
This document will be a part of a larger document that will provide
an introduction and overview on the
deployment of wordnets over the Semantic Web, produced by the Semantic Web Best
Practices and Deployment Working Group.
This document is a W3C Working Draft and is expected to change.
This document is the First Public Working Draft. We encourage public
comments. Please send comments to public-swbp-wg@w3.org
Open issues, todo items:
- A complete review of existing SW-compliant ports of wordnets
- A related document that proposes an agreed RDFS and OWL datamodel
for Princeton
WordNet
- A compared list of existing techniques to reengineer wordnets as
ontologies
Publication as a draft does not imply endorsement by the W3C
Membership. This is a draft document and may be updated, replaced or
made obsolete by other documents at any time. It is inappropriate to
cite this document as other than work in progress.
General issue
Wordnets are databases of lexical data, usually including
information on hypernyms, synonyms, polysemous terms, relations between
terms, and sometimes multilingual equivalents. For example, the
Princeton
WordNet is
described as follows:
"WordNet® is an online
lexical reference system whose design
is inspired by current psycholinguistic theories of human lexical
memory. English nouns, verbs, adjectives and adverbs are organized into
synonym sets, each representing one underlying lexical concept.
Different relations link the synonym sets."
Within the Semantic
Web Best
Practices and Deployment Working Group a WNET Task Force
has been established for agreeing upon a Semantic Web version of
Princeton WordNet, and to suggest a common datamodel and some best
practices to port wordnets to the Semantic Web. The TF description can
be found here.
Method
The first goal of WNET is to
agree upon a schema (initially in RDFS, then in OWL) to translate
Princeton WordNet. A preliminary RDFS of Princeton WordNet datamodel is
here,
and a revised one is here.
A preliminary datamodel in OWL that elaborates on the ones developed by
the knOWLer project
and by Decker &
Melnik's RDF representation is here.
Although this first goal does not aim at an ontological interpretation
of WordNet (that is the second goal of WNET), it
already presents us
some
choices, which are partly common to other attempts to port database
schemas (e.g., this
project report) or thesauri (cf. the THES TF in this WG)
to Semantic Web languages.
In practice, decisions must be taken if the original datamodel elements
should be translated either as classes or properties, or if they could
be ignored in the translation. Further decisions concern the
cardinality of possible restrictions on the use of properties between
the translated classes. Examples of these decisions are contained in
the abovementioned schemas.
Once agreed on the basic RDFS datamodel of WordNet, an OWL version can
be produced that includes also restrictions.
The OWL (and RDFS) versions will be submitted to the interested parties
mentioned in the WNET
description, and a revision cycle will be started.
A crucial point consists in getting a namespace from Princeton's
developers, and trying to include the porting in the next official
distribution.
Based on the datamodel, a porting of the original database (or part of
it) to an RDF
model will be performed. It is suggested to start directly from WordNet
version 2.0, but possible conflicts with needs of implementors
using preliminary versions (e.g. 1.7 in the knOWLer porting)
should be considered. For each version of WordNet, specific mappings
are usually provided by different researchers: these resources can be
exploited if needed.
Once this first goal has been reached, suggestions and existing work
done on reengineering (not translating) WordNet as an ontology will be
collected, and a cycle of drafts and revision will be started within
the Semantic
Web Best
Practices and Deployment Working Group and the external users. An
extended example on how to use knowledge engineering techniques to
reengineer wordnets and thesauri as Semantic Web ontologies can be
found here.