Re: [WNET] new data and schema files

At 04:00 PM 5/24/2006 +0300, Mark van Assem wrote:
>I will rerun my script, do a new cvs commuit tonight, and check the schemas again so that everything is in order tomorrow morning.

OK. Looking at the changes you made to convertwn20.pl
the content of the output won't change, just the names.

>wordnet-synset is the correct filename, wordnet-senselabels is correct (last one was an error in the conversion program, corrected that too).

I don't understand why one is singular and the other plural, but OK :)

>>However, I wonder if the senseLabel statements should be
>>included in the CBDs for Full as well?  (I did not include them
>>on my first pass).  One interpretation of [1] is that Full contains
>>all of the files listed, including senseLabels.
>
>I see it like this: all the files together form the "complete" WordNet. The Full and Basic are two (different) partitions. Notice that in the "complete" WordNet the senseLabels are actually redundant.

The editor's draft  [1], 24 May revision does not clearly (to me) specify
how the senseLabel value is computed. I interpret the text

   'The property value is filled with the lexical forms that are attached
   to Words in the Full version."

to mean that there is a senseLabel statement with the value
of the lexicalForm property of the word in each wordsense
in the synset.  That is, if there are multiple wordsenses in a
synset there should be multiple senseLabel statements for
that synset.  And one of those senseLabels should match the
rdfs:label of the synset.

Am I correct?

Duplicate elimination might be done when the words in different
wordsenses have the same lexicalForm.  If this duplicate
elimination happens to reduce the data to just one senseLabel
statement for each synset then the senseLabel and rdfs:label
statements for the synsets are redundant.  We could simply
declare wn20:senseLabel to be rdfs:subPropertyOf rdfs:label
and use only the senseLabel property in wordnet-synset.rdf.

The words in "Use of rdfs:label" in Appendix D do suggest that
there are synsets with multiple distinct senseLabels.  I'll try to
do a query to find such a case.  BTW -- there's another typo;
that paragraph in Appendix D refers to lexicalLabel when it
must mean lexicalForm.

[1] http://www.w3.org/2001/sw/BestPractices/WNET/wn-conversion-20062304
    $Id: wn-conversion-20062304.html,v 1.9 2006/05/24 12:14:37 mvanasse Exp $

>I think we should provide only the Full version online. Those who want the labels can get them by additional queries for the appropriate Words and their lexicalForms.

Yes, it can be computed by clients but it seems that a
preponderance of the data will have an rdfs:label property
on each synset that exactly matches the only senseLabel
property for that synset, so we could simply collapse the
senselabels.rdf data into synset.rdf.

>>I do think that senseLabel should be added somewhere in Figure 2.
>
>That might suggest to Full users (who are the target of that part of the document) that there are senseLabels in Full. I'd rather present Full in the main doc and present the Basic in the separate section that explains their differences.

And I'm thinking that if I am correctly understanding the use of
senseLabel and rdfs:label on synsets we could make Basic
be a proper subset of Full and eliminate one file.

While researching this I ran across another difference between
[1] and the output of convertwn20.pl revision 1.9.  The example CBD
in Section 3 "Selecting and Querying the appropriate WN version"
shows an inSynset property which, per Appendix D, is the inverse
of containsWordSense though it's missing from Figure 2.  I think
the inSynset statements will be very useful and I'd like to see them
in the data.  (I also think a wordOf property with Domain Word and
Range WordSense would be quite useful.  This can wait for a future
version.)

Received on Wednesday, 24 May 2006 14:37:15 UTC