WordNet in RDF/XML: 50,000+ RDF class vocabulary...

RDF IG,

I've been trying and failing to find time to write up my WordNet/RDF
experiments. Instead, I thought I'd post as-is what I currently have
working. Code to follow after minor cleanup.

Context:
WordNet is a large lexical database, consisting of 10s of thousands of
commonsense English concepts. The WordNet site contains a wealth more
information, including links to WordNet's use in the information
retrieval and digital library community, as well as to spin-offs like
EuroWordNet, which maps the WordNet vocabulary to non-English languages.
See http://cogsci.princeton.edu/~wn/


I'm very interested in the potential of WordNet for 'semantic web'
applications, not least because the data is available unencumbered for
commercial and noncommercial use. So, I spent a little time thinking about
how WordNet can be mapped into RDF. There appears to be a trivial mapping
from the 'noun' portion of the WordNet database to a hierarchy of RDF
classes. I've not investigated models for representing the other aspects
of WordNet
yet.

Here's an example of the output from a commandline version:


	[danbri]% wn tree -hypen|more
	
	Synonyms/Hypernyms (Ordered by Frequency) of noun tree

	2 senses of tree

	Sense 1
	tree
	       => woody plant, ligneous plant
	           => vascular plant, tracheophyte
	               => plant, flora, plant life
        	           => life form, organism, being, living thing
        	               => entity, something

	Sense 2
	tree, tree diagram
	       => plane figure, two-dimensional figure
	           => figure
	               => shape, form
	                   => attribute
	                       => abstraction
	


Each 'word sense' in WordNet's collection of nouns can, I believe,
simply be mapped into RDF's notion of a class. For eg., 'tree' in sense
one above would be the class of all trees (ie. a subset of all the woody
plants).

If we give URIs to these classes, eg.

	http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/tree~1

we can use them as an RDF vocabulary, and represent the wordnet hierarchy
as sub-class relationships.

I've rigged up a simple prototype (a tiny Perl CGI script) which wraps
WordNet in a WWW interface such that, given a term and a sense number
(eg. 'tree' sense '1') it returns an RDF description of that part of the
WordNet type hierarchy. The particular strategy I adopted (which you can
see if you look at
http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/woody_plant~1
or other URIs on my test server) is for a class URI to dereference to a
sparse description of the superclasses and a verbose description of the
immediate subclasses. I suspect this is back to front.

Anyway, comments welcomed. See the official wordnet site for a
human-oriented HTML forms interface to the dataset, or simply guess URLs
for my server (if you guess a word not in the database, you get an empty
RDF graph).

more examples:
http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/cat~1
http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/cat~2 (ie. sense 2 of cat)
http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/geek~1

If there were an agreed URI for WordNet, instance data could look like
this... 

<!-- using rdf, dublin core and wordnet namespaces -->
<rdf:Description>
	<WordNet:bitmap~1 rdf:about="">  
		<dc:subject>
		<WordNet:geek~1" rdf:about="http://purl.org/people/danbri"/>
		</dc:subject>	
	</WordNet:bitmap~1>
</rdf:Description>

This says, 'this object is a member of the class of bitmaps; it has at its
subject another object of type 'geek', whose URI is (etc...). So we might
immediately think about using WordNet inside multimedia content,
PNG/JPEG/GIF etc to improve accessibility and searchability of the
content.

The RDF type hierarchy I exposed tells us in RDF that bitmaps are a kind
of picture which are a kind of representation etc., and gives simple 
definitions for each (eg. "an image represented as a two dimensional array
of brightness values for pixels"). Similarly for geeks being kinds of
persons etc...

I think there are a few glitches in my online demo, but it should be
enough to give a flavour of the possibilities.


Comments, suggestions etc welcomed,

Dan


--
danbri@w3.org

Received on Thursday, 2 December 1999 20:20:57 UTC