- From: Dan Brickley <danbri@w3.org>
- Date: Thu, 2 Dec 1999 20:20:55 -0500 (EST)
- To: www-rdf-interest@w3.org
- cc: wordnet@princeton.edu
RDF IG,
I've been trying and failing to find time to write up my WordNet/RDF
experiments. Instead, I thought I'd post as-is what I currently have
working. Code to follow after minor cleanup.
Context:
WordNet is a large lexical database, consisting of 10s of thousands of
commonsense English concepts. The WordNet site contains a wealth more
information, including links to WordNet's use in the information
retrieval and digital library community, as well as to spin-offs like
EuroWordNet, which maps the WordNet vocabulary to non-English languages.
See http://cogsci.princeton.edu/~wn/
I'm very interested in the potential of WordNet for 'semantic web'
applications, not least because the data is available unencumbered for
commercial and noncommercial use. So, I spent a little time thinking about
how WordNet can be mapped into RDF. There appears to be a trivial mapping
from the 'noun' portion of the WordNet database to a hierarchy of RDF
classes. I've not investigated models for representing the other aspects
of WordNet
yet.
Here's an example of the output from a commandline version:
[danbri]% wn tree -hypen|more
Synonyms/Hypernyms (Ordered by Frequency) of noun tree
2 senses of tree
Sense 1
tree
=> woody plant, ligneous plant
=> vascular plant, tracheophyte
=> plant, flora, plant life
=> life form, organism, being, living thing
=> entity, something
Sense 2
tree, tree diagram
=> plane figure, two-dimensional figure
=> figure
=> shape, form
=> attribute
=> abstraction
Each 'word sense' in WordNet's collection of nouns can, I believe,
simply be mapped into RDF's notion of a class. For eg., 'tree' in sense
one above would be the class of all trees (ie. a subset of all the woody
plants).
If we give URIs to these classes, eg.
http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/tree~1
we can use them as an RDF vocabulary, and represent the wordnet hierarchy
as sub-class relationships.
I've rigged up a simple prototype (a tiny Perl CGI script) which wraps
WordNet in a WWW interface such that, given a term and a sense number
(eg. 'tree' sense '1') it returns an RDF description of that part of the
WordNet type hierarchy. The particular strategy I adopted (which you can
see if you look at
http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/woody_plant~1
or other URIs on my test server) is for a class URI to dereference to a
sparse description of the superclasses and a verbose description of the
immediate subclasses. I suspect this is back to front.
Anyway, comments welcomed. See the official wordnet site for a
human-oriented HTML forms interface to the dataset, or simply guess URLs
for my server (if you guess a word not in the database, you get an empty
RDF graph).
more examples:
http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/cat~1
http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/cat~2 (ie. sense 2 of cat)
http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/geek~1
If there were an agreed URI for WordNet, instance data could look like
this...
<!-- using rdf, dublin core and wordnet namespaces -->
<rdf:Description>
<WordNet:bitmap~1 rdf:about="">
<dc:subject>
<WordNet:geek~1" rdf:about="http://purl.org/people/danbri"/>
</dc:subject>
</WordNet:bitmap~1>
</rdf:Description>
This says, 'this object is a member of the class of bitmaps; it has at its
subject another object of type 'geek', whose URI is (etc...). So we might
immediately think about using WordNet inside multimedia content,
PNG/JPEG/GIF etc to improve accessibility and searchability of the
content.
The RDF type hierarchy I exposed tells us in RDF that bitmaps are a kind
of picture which are a kind of representation etc., and gives simple
definitions for each (eg. "an image represented as a two dimensional array
of brightness values for pixels"). Similarly for geeks being kinds of
persons etc...
I think there are a few glitches in my online demo, but it should be
enough to give a flavour of the possibilities.
Comments, suggestions etc welcomed,
Dan
--
danbri@w3.org
Received on Thursday, 2 December 1999 20:20:57 UTC