- From: Dan Brickley <danbri@w3.org>
- Date: Thu, 2 Dec 1999 20:20:55 -0500 (EST)
- To: www-rdf-interest@w3.org
- cc: wordnet@princeton.edu
RDF IG, I've been trying and failing to find time to write up my WordNet/RDF experiments. Instead, I thought I'd post as-is what I currently have working. Code to follow after minor cleanup. Context: WordNet is a large lexical database, consisting of 10s of thousands of commonsense English concepts. The WordNet site contains a wealth more information, including links to WordNet's use in the information retrieval and digital library community, as well as to spin-offs like EuroWordNet, which maps the WordNet vocabulary to non-English languages. See http://cogsci.princeton.edu/~wn/ I'm very interested in the potential of WordNet for 'semantic web' applications, not least because the data is available unencumbered for commercial and noncommercial use. So, I spent a little time thinking about how WordNet can be mapped into RDF. There appears to be a trivial mapping from the 'noun' portion of the WordNet database to a hierarchy of RDF classes. I've not investigated models for representing the other aspects of WordNet yet. Here's an example of the output from a commandline version: [danbri]% wn tree -hypen|more Synonyms/Hypernyms (Ordered by Frequency) of noun tree 2 senses of tree Sense 1 tree => woody plant, ligneous plant => vascular plant, tracheophyte => plant, flora, plant life => life form, organism, being, living thing => entity, something Sense 2 tree, tree diagram => plane figure, two-dimensional figure => figure => shape, form => attribute => abstraction Each 'word sense' in WordNet's collection of nouns can, I believe, simply be mapped into RDF's notion of a class. For eg., 'tree' in sense one above would be the class of all trees (ie. a subset of all the woody plants). If we give URIs to these classes, eg. http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/tree~1 we can use them as an RDF vocabulary, and represent the wordnet hierarchy as sub-class relationships. I've rigged up a simple prototype (a tiny Perl CGI script) which wraps WordNet in a WWW interface such that, given a term and a sense number (eg. 'tree' sense '1') it returns an RDF description of that part of the WordNet type hierarchy. The particular strategy I adopted (which you can see if you look at http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/woody_plant~1 or other URIs on my test server) is for a class URI to dereference to a sparse description of the superclasses and a verbose description of the immediate subclasses. I suspect this is back to front. Anyway, comments welcomed. See the official wordnet site for a human-oriented HTML forms interface to the dataset, or simply guess URLs for my server (if you guess a word not in the database, you get an empty RDF graph). more examples: http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/cat~1 http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/cat~2 (ie. sense 2 of cat) http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/geek~1 If there were an agreed URI for WordNet, instance data could look like this... <!-- using rdf, dublin core and wordnet namespaces --> <rdf:Description> <WordNet:bitmap~1 rdf:about=""> <dc:subject> <WordNet:geek~1" rdf:about="http://purl.org/people/danbri"/> </dc:subject> </WordNet:bitmap~1> </rdf:Description> This says, 'this object is a member of the class of bitmaps; it has at its subject another object of type 'geek', whose URI is (etc...). So we might immediately think about using WordNet inside multimedia content, PNG/JPEG/GIF etc to improve accessibility and searchability of the content. The RDF type hierarchy I exposed tells us in RDF that bitmaps are a kind of picture which are a kind of representation etc., and gives simple definitions for each (eg. "an image represented as a two dimensional array of brightness values for pixels"). Similarly for geeks being kinds of persons etc... I think there are a few glitches in my online demo, but it should be enough to give a flavour of the possibilities. Comments, suggestions etc welcomed, Dan -- danbri@w3.org
Received on Thursday, 2 December 1999 20:20:57 UTC