- From: Jacco van Ossenbruggen <Jacco.van.Ossenbruggen@cwi.nl>
- Date: Tue, 13 Dec 2005 14:07:56 +0100
- To: Mark van Assem <mark@cs.vu.nl>
- CC: Aldo Gangemi <aldo.gangemi@istc.cnr.it>, public-swbp-wg@w3.org, schreiber@cs.vu.nl, jjc@hpl.hp.com, Benjamin.Nguyen@inria.fr
Mark van Assem wrote: > The "convenience" requirement might be satisfied better by (a) > removing the inverses like you and Jan argued before; and (b) separate > the files into e.g. separate ones for the noun and verb hierarchies. > Maybe this already gives enough reduction? That would certainly help. Right now, wordnet-synset.rdf is a single >100MB file. Removing inverses and splitting it up would help a lot. > Something that I would like your input for is the question what the > relation between size and convenience is. It is not very fair to > compare this conversion to e.g. one that does not have all hierarchies > or does not have all relationships. Note that I already put each > relation in a separate file, so that's configurable and allows for a > more fair size comparison. I agree. My judgement was mainly based on the big single synset file. If that can be split up, size comparisons would be more fair and more informative. Main point is that the size is an issue, ignoring it will not make the problem go away. Part of the problem is not yours and is just that many users will only have encountered the SemWeb by (toy) applications and going to a million triple dataset is just part of the culture shock they will have to go through. But the other part of the problem is that you have made design choices that make it much better (the word-sense as urL, see below). This is fine, but should be explicitly discussed. > One point where we might gain a lot (reduce size) is by representing > word(senses) directly as labels on synsets. But then you lose the > ability to annotate with WordSenses. So my concrete question is: is it > desirable to lose this ability in trade for a size reduction? Well, first a procedural comment: this is exactly the type of question I'd like to have discussed in the document (and perhaps also on this list in an earlier stage). Second, to make an informative decision, I will need to know what the facts are. Most important: - How much do I gain in size by removing the word sense urls and how much do I gain in functionality by keeping them? - What are the options for doing both? Would this only result in two versions of the data that is now in wordnet-synset.rdf? Then doing both would be an option, provided that you give the reader sufficient information to make a good choice. Can you do a single word-sense-less (sorry if this ain't English :-) version of the synsets and provide two separate word-sense-as-labels and word-sense-as-URI files? Would this also make future translations to other languages easier? Jacco
Received on Tuesday, 13 December 2005 13:15:42 UTC