W3C home > Mailing lists > Public > public-swbp-wg@w3.org > December 2005

Re: [WN] Fwd: WordNet Namespace

From: Jacco van Ossenbruggen <Jacco.van.Ossenbruggen@cwi.nl>
Date: Tue, 13 Dec 2005 14:07:56 +0100
Message-ID: <439EC7AC.6030006@cwi.nl>
To: Mark van Assem <mark@cs.vu.nl>
CC: Aldo Gangemi <aldo.gangemi@istc.cnr.it>, public-swbp-wg@w3.org, schreiber@cs.vu.nl, jjc@hpl.hp.com, Benjamin.Nguyen@inria.fr

Mark van Assem wrote:

> The "convenience" requirement might be satisfied better by (a) 
> removing the inverses like you and Jan argued before; and (b) separate 
> the files into e.g. separate ones for the noun and verb hierarchies.
> Maybe this already gives enough reduction?

That would certainly help.  Right now, wordnet-synset.rdf is a single 
 >100MB file.  Removing inverses and splitting it up would help a lot.

> Something that I would like your input for is the question what the 
> relation between size and convenience is. It is not very fair to 
> compare this conversion to e.g. one that does not have all hierarchies 
> or does not have all relationships. Note that I already put each 
> relation in a separate file, so that's configurable and allows for a 
> more fair size comparison.

I agree.  My judgement was mainly based on the big single synset file.  
If that can be split up, size comparisons would be more fair and more 
informative.
Main point is that the size is an issue, ignoring it will not make the 
problem go away.  Part of the problem is not yours and is just that many 
users will only have encountered the SemWeb by (toy) applications and 
going to a million triple dataset is just part of the culture shock they 
will have to go through. 
But the other part of the problem is that you have made design choices 
that make it much better (the word-sense as urL, see below).  This is 
fine, but should be explicitly discussed.

> One point where we might gain a lot (reduce size) is by representing 
> word(senses) directly as labels on synsets. But then you lose the 
> ability to annotate with WordSenses. So my concrete question is: is it
> desirable to lose this ability in trade for a size reduction?

Well, first a procedural comment: this is exactly the type of question 
I'd like to have discussed in the document (and perhaps also on this 
list in an earlier stage).
Second, to make an informative decision, I will need to know what the 
facts are.  Most important:
- How much do I gain in size by removing the word sense urls and how 
much do I gain in functionality by keeping them?
- What are the options for doing both? Would this only result in two 
versions of the data that is now in wordnet-synset.rdf?  Then doing both 
would be an option, provided that you give the reader sufficient 
information to make a good choice.
Can you do a single word-sense-less (sorry if this ain't English :-) 
version of the synsets and provide two separate word-sense-as-labels and 
word-sense-as-URI files?
Would this also make future translations to other languages easier?

Jacco
Received on Tuesday, 13 December 2005 13:15:42 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:31:15 UTC