RE: WordNet in RDF/XML: 50,000+ RDF class vocabulary... from Dan Brickley on 1999-12-10 (www-rdf-interest@w3.org from December 1999)

From: Dan Brickley <danbri@w3.org>
Date: Thu, 9 Dec 1999 19:39:04 -0500 (EST)
To: www-rdf-interest@w3.org
Message-ID: <Pine.LNX.4.20.9912091911230.22235-100000@tux.w3.org>
On Thu, 9 Dec 1999, Richard Humpleman - SISA wrote:
> This seems to have stopped working but it worked great before. Any chance
> to get it going for a little while looonger?
> 
> Reference to undeclared namespace prefix: 'r'. Line 14, Position 51 
> <Class r:about="http://xmlns.com/wordnet/1.6/cat

Ooops, thanks. I've fixed it.

I had a number of comments about the unwieldy long URLs I was using, and
the multiple levels of namespace management they depend upon. So I moved
it to a http://xmlns.com/wordnet/1.6/* address instead, which is a domain
explicitly committed to (eventual) persistence and non-cheesiness. In
process, I broke the script and a bunch of URLs -- so much for persistence!


Ideally, Princeton would host the canonical URL for WordNet since they
manage the vocabulary. Failing that, I think there's a lot to be gained by
agreeing within the RDF community a single namespace URI for WordNet 1.6
concepts. Using purl.org or something under desire.org are other
possibilities I've been toying with.

<aside>
I have no personal desire to manage xmlns.com for all time so am
looking for ways of offloading the domain to a committee of do-gooder
RDF/XML enthusiasts who want short reliable names for 'semantic web'
namespaces. Neither ILRT, University of Bristol nor W3C have any
commitment to maintaining names in that namespace. So - I'd like to hand
off reponsibility for xmlns.com to some entity more reliable than
myself. Suggestions (in separate thread) welcomed :-)
</aside>  

Anyway, regarding WordNet I need to flag up a major issue: my current demo
conflates 'word senses' with the words associated with those senses. My
rather dusty knowledge of WordNet is that 'senses' are clusters of broadly
equivalent terms, eg. as shown comma-separated here:

> >>> 	tree
> >>> 	       => woody plant, ligneous plant
> >>> 	           => vascular plant, tracheophyte
> >>> 	               => plant, flora, plant life

So the sub-class relation is between clusters of terms. This leaves us
with a dillema: do we assign URIs to senses, terms or both? I'd expect
both, but with most classifications happening in terms of senses. In which
case the question is -- what identifier is appropriate to id a word sense?

I'm hoping that someone more familiar with details of WordNet will step
in and tell us how best to do this... Suggestions anyone?

Dan

> 
> Regards,
> Richard Humpleman.
> 
> >>> -----Original Message-----
> >>> From: Dan Brickley [mailto:danbri@w3.org]
> >>> Sent: Thursday, December 02, 1999 5:21 PM
> >>> To: www-rdf-interest@w3.org
> >>> Cc: wordnet@princeton.edu
> >>> Subject: WordNet in RDF/XML: 50,000+ RDF class vocabulary...
> >>> 
> >>> 
> >>> 
> >>> 
> >>> RDF IG,
> >>> 
> >>> I've been trying and failing to find time to write up my WordNet/RDF
> >>> experiments. Instead, I thought I'd post as-is what I currently have
> >>> working. Code to follow after minor cleanup.
> >>> 
> >>> Context:
> >>> WordNet is a large lexical database, consisting of 10s of 
> >>> thousands of
> >>> commonsense English concepts. The WordNet site contains a 
> >>> wealth more
> >>> information, including links to WordNet's use in the information
> >>> retrieval and digital library community, as well as to 
> >>> spin-offs like
> >>> EuroWordNet, which maps the WordNet vocabulary to 
> >>> non-English languages.
> >>> See http://cogsci.princeton.edu/~wn/
> >>> 
> >>> 
> >>> I'm very interested in the potential of WordNet for 'semantic web'
> >>> applications, not least because the data is available 
> >>> unencumbered for
> >>> commercial and noncommercial use. So, I spent a little time 
> >>> thinking about
> >>> how WordNet can be mapped into RDF. There appears to be a 
> >>> trivial mapping
> >>> from the 'noun' portion of the WordNet database to a 
> >>> hierarchy of RDF
> >>> classes. I've not investigated models for representing the 
> >>> other aspects
> >>> of WordNet
> >>> yet.
> >>> 
> >>> Here's an example of the output from a commandline version:
> >>> 
> >>> 
> >>> 	[danbri]% wn tree -hypen|more
> >>> 	
> >>> 	Synonyms/Hypernyms (Ordered by Frequency) of noun tree
> >>> 
> >>> 	2 senses of tree
> >>> 
> >>> 	Sense 1
> >>> 	tree
> >>> 	       => woody plant, ligneous plant
> >>> 	           => vascular plant, tracheophyte
> >>> 	               => plant, flora, plant life
> >>>         	           => life form, organism, being, living thing
> >>>         	               => entity, something
> >>> 
> >>> 	Sense 2
> >>> 	tree, tree diagram
> >>> 	       => plane figure, two-dimensional figure
> >>> 	           => figure
> >>> 	               => shape, form
> >>> 	                   => attribute
> >>> 	                       => abstraction
> >>> 	
> >>> 
> >>> 
> >>> Each 'word sense' in WordNet's collection of nouns can, I believe,
> >>> simply be mapped into RDF's notion of a class. For eg., 
> >>> 'tree' in sense
> >>> one above would be the class of all trees (ie. a subset of 
> >>> all the woody
> >>> plants).
> >>> 
> >>> If we give URIs to these classes, eg.
> >>> 
> >>> 	http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/tree~1
> >>> 
> >>> we can use them as an RDF vocabulary, and represent the 
> >>> wordnet hierarchy
> >>> as sub-class relationships.
> >>> 
> >>> I've rigged up a simple prototype (a tiny Perl CGI script) 
> >>> which wraps
> >>> WordNet in a WWW interface such that, given a term and a 
> >>> sense number
> >>> (eg. 'tree' sense '1') it returns an RDF description of 
> >>> that part of the
> >>> WordNet type hierarchy. The particular strategy I adopted 
> >>> (which you can
> >>> see if you look at
> >>> http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/woody_plant~1
> >>> or other URIs on my test server) is for a class URI to 
> >>> dereference to a
> >>> sparse description of the superclasses and a verbose 
> >>> description of the
> >>> immediate subclasses. I suspect this is back to front.
> >>> 
> >>> Anyway, comments welcomed. See the official wordnet site for a
> >>> human-oriented HTML forms interface to the dataset, or 
> >>> simply guess URLs
> >>> for my server (if you guess a word not in the database, you 
> >>> get an empty
> >>> RDF graph).
> >>> 
> >>> more examples:
> >>> http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/cat~1
> >>> http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/cat~2 
> >>> (ie. sense 2 of cat)
> >>> http://snowball.ilrt.bris.ac.uk/xmlns/wordnet/noun/geek~1
> >>> 
> >>> If there were an agreed URI for WordNet, instance data 
> >>> could look like
> >>> this... 
> >>> 
> >>> <!-- using rdf, dublin core and wordnet namespaces -->
> >>> <rdf:Description>
> >>> 	<WordNet:bitmap~1 rdf:about="">  
> >>> 		<dc:subject>
> >>> 		<WordNet:geek~1" 
> >>> rdf:about="http://purl.org/people/danbri"/>
> >>> 		</dc:subject>	
> >>> 	</WordNet:bitmap~1>
> >>> </rdf:Description>
> >>> 
> >>> This says, 'this object is a member of the class of 
> >>> bitmaps; it has at its
> >>> subject another object of type 'geek', whose URI is 
> >>> (etc...). So we might
> >>> immediately think about using WordNet inside multimedia content,
> >>> PNG/JPEG/GIF etc to improve accessibility and searchability of the
> >>> content.
> >>> 
> >>> The RDF type hierarchy I exposed tells us in RDF that 
> >>> bitmaps are a kind
> >>> of picture which are a kind of representation etc., and 
> >>> gives simple 
> >>> definitions for each (eg. "an image represented as a two 
> >>> dimensional array
> >>> of brightness values for pixels"). Similarly for geeks 
> >>> being kinds of
> >>> persons etc...
> >>> 
> >>> I think there are a few glitches in my online demo, but it should be
> >>> enough to give a flavour of the possibilities.
> >>> 
> >>> 
> >>> Comments, suggestions etc welcomed,
> >>> 
> >>> Dan
> >>> 
> >>> 
> >>> --
> >>> danbri@w3.org
> >>> 
> >>> 
>
Received on Thursday, 9 December 1999 19:39:05 UTC