WordNet in RDF/XML: 50,000+ RDF class vocabulary...


I've been trying and failing to find time to write up my WordNet/RDF
experiments. Instead, I thought I'd post as-is what I currently have
working. Code to follow after minor cleanup.

WordNet is a large lexical database, consisting of 10s of thousands of
commonsense English concepts. The WordNet site contains a wealth more
information, including links to WordNet's use in the information
retrieval and digital library community, as well as to spin-offs like
EuroWordNet, which maps the WordNet vocabulary to non-English languages.
See http://cogsci.princeton.edu/~wn/

I'm very interested in the potential of WordNet for 'semantic web'
applications, not least because the data is available unencumbered for
commercial and noncommercial use. So, I spent a little time thinking about
how WordNet can be mapped into RDF. There appears to be a trivial mapping
from the 'noun' portion of the WordNet database to a hierarchy of RDF
classes. I've not investigated models for representing the other aspects
of WordNet

Here's an example of the output from a commandline version:

	[danbri]% wn tree -hypen|more
	Synonyms/Hypernyms (Ordered by Frequency) of noun tree

	2 senses of tree

	Sense 1
	       => woody plant, ligneous plant
	           => vascular plant, tracheophyte
	               => plant, flora, plant life
        	           => life form, organism, being, living thing
        	               => entity, something

	Sense 2
	tree, tree diagram
	       => plane figure, two-dimensional figure
	           => figure
	               => shape, form
	                   => attribute
	                       => abstraction

Each 'word sense' in WordNet's collection of nouns can, I believe,
simply be mapped into RDF's notion of a class. For eg., 'tree' in sense
one above would be the class of all trees (ie. a subset of all the woody

If we give URIs to these classes, eg.


we can use them as an RDF vocabulary, and represent the wordnet hierarchy
as sub-class relationships.

I've rigged up a simple prototype (a tiny Perl CGI script) which wraps
WordNet in a WWW interface such that, given a term and a sense number
(eg. 'tree' sense '1') it returns an RDF description of that part of the
WordNet type hierarchy. The particular strategy I adopted (which you can
see if you look at
or other URIs on my test server) is for a class URI to dereference to a
sparse description of the superclasses and a verbose description of the
immediate subclasses. I suspect this is back to front.

Anyway, comments welcomed. See the official wordnet site for a
human-oriented HTML forms interface to the dataset, or simply guess URLs
for my server (if you guess a word not in the database, you get an empty
RDF graph).

more examples:
http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/cat~2 (ie. sense 2 of cat)

If there were an agreed URI for WordNet, instance data could look like

<!-- using rdf, dublin core and wordnet namespaces -->
	<WordNet:bitmap~1 rdf:about="">  
		<WordNet:geek~1" rdf:about="http://purl.org/people/danbri"/>

This says, 'this object is a member of the class of bitmaps; it has at its
subject another object of type 'geek', whose URI is (etc...). So we might
immediately think about using WordNet inside multimedia content,
PNG/JPEG/GIF etc to improve accessibility and searchability of the

The RDF type hierarchy I exposed tells us in RDF that bitmaps are a kind
of picture which are a kind of representation etc., and gives simple 
definitions for each (eg. "an image represented as a two dimensional array
of brightness values for pixels"). Similarly for geeks being kinds of
persons etc...

I think there are a few glitches in my online demo, but it should be
enough to give a flavour of the possibilities.

Comments, suggestions etc welcomed,



Received on Thursday, 2 December 1999 20:20:57 UTC