- From: Ralph R. Swick <swick@w3.org>
- Date: Wed, 24 May 2006 10:36:21 -0400
- To: Mark van Assem <mark@few.vu.nl>
- Cc: SWBPD list <public-swbp-wg@w3.org>
At 04:00 PM 5/24/2006 +0300, Mark van Assem wrote: >I will rerun my script, do a new cvs commuit tonight, and check the schemas again so that everything is in order tomorrow morning. OK. Looking at the changes you made to convertwn20.pl the content of the output won't change, just the names. >wordnet-synset is the correct filename, wordnet-senselabels is correct (last one was an error in the conversion program, corrected that too). I don't understand why one is singular and the other plural, but OK :) >>However, I wonder if the senseLabel statements should be >>included in the CBDs for Full as well? (I did not include them >>on my first pass). One interpretation of [1] is that Full contains >>all of the files listed, including senseLabels. > >I see it like this: all the files together form the "complete" WordNet. The Full and Basic are two (different) partitions. Notice that in the "complete" WordNet the senseLabels are actually redundant. The editor's draft [1], 24 May revision does not clearly (to me) specify how the senseLabel value is computed. I interpret the text 'The property value is filled with the lexical forms that are attached to Words in the Full version." to mean that there is a senseLabel statement with the value of the lexicalForm property of the word in each wordsense in the synset. That is, if there are multiple wordsenses in a synset there should be multiple senseLabel statements for that synset. And one of those senseLabels should match the rdfs:label of the synset. Am I correct? Duplicate elimination might be done when the words in different wordsenses have the same lexicalForm. If this duplicate elimination happens to reduce the data to just one senseLabel statement for each synset then the senseLabel and rdfs:label statements for the synsets are redundant. We could simply declare wn20:senseLabel to be rdfs:subPropertyOf rdfs:label and use only the senseLabel property in wordnet-synset.rdf. The words in "Use of rdfs:label" in Appendix D do suggest that there are synsets with multiple distinct senseLabels. I'll try to do a query to find such a case. BTW -- there's another typo; that paragraph in Appendix D refers to lexicalLabel when it must mean lexicalForm. [1] http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion-20062304 $Id: wn-conversion-20062304.html,v 1.9 2006/05/24 12:14:37 mvanasse Exp $ >I think we should provide only the Full version online. Those who want the labels can get them by additional queries for the appropriate Words and their lexicalForms. Yes, it can be computed by clients but it seems that a preponderance of the data will have an rdfs:label property on each synset that exactly matches the only senseLabel property for that synset, so we could simply collapse the senselabels.rdf data into synset.rdf. >>I do think that senseLabel should be added somewhere in Figure 2. > >That might suggest to Full users (who are the target of that part of the document) that there are senseLabels in Full. I'd rather present Full in the main doc and present the Basic in the separate section that explains their differences. And I'm thinking that if I am correctly understanding the use of senseLabel and rdfs:label on synsets we could make Basic be a proper subset of Full and eliminate one file. While researching this I ran across another difference between [1] and the output of convertwn20.pl revision 1.9. The example CBD in Section 3 "Selecting and Querying the appropriate WN version" shows an inSynset property which, per Appendix D, is the inverse of containsWordSense though it's missing from Figure 2. I think the inSynset statements will be very useful and I'd like to see them in the data. (I also think a wordOf property with Domain Word and Range WordSense would be quite useful. This can wait for a future version.)
Received on Wednesday, 24 May 2006 14:37:15 UTC