W3C home > Mailing lists > Public > public-swbp-wg@w3.org > November 2004

wordnet breakout: raw notes

From: Dan Brickley <danbri@w3.org>
Date: Tue, 2 Nov 2004 08:34:33 -0500
To: public-swbp-wg@w3.org
Message-ID: <20041102133433.GA14689@homer.w3.org>


danbri's raw notes from Wordnet breakout, day 2 of SWBP f2f.

=================

Redundant grepped summary:

grep ACTION SWBP_Wordnet_TF_Breakout.txt 
ACTION: danbri propose some URI formatting rules for word senses and
synsets
ACTION: guus make tests on URI labels once suggested
ACTION andreas Investigate Wordnet maintainance policy re synset IDs
ACTION: brian ask Aldo to review meronym superproperty decision
ACTION: guus ask Jan W to write Prolog transformation into RDF/XML
ACTION: danbri propose some URI formatting rules for word senses and
synsets
ACTION: guus make tests on URI labels once suggested
ACTION andreas Investigate Wordnet maintainance policy re synset IDs
ACTION: brian ask Aldo to review meronym superproperty decision
ACTION: andreas investigate Sentence Frame / Fr relationship in the
Prolog, find examples etc.
ACTION: guus ask Jan W to write Prolog transformation into RDF/XML

grep RESOLVED SWBP_Wordnet_TF_Breakout.txt 
RESOLVED: we'll define a custom property wn:lexicalForm that
subproperties rdfs:label; it has a cardinality of precisely-1.
RESOLVED: to consult w/ Princeton team (thru brian) re requirements on
glossary entry, does it need xml literals?
RESOLVED: remove meronym superproperty [pending review from aldo]
RESOLVED: not to add a verb group for now
RESOLVED: we'll define a custom property wn:lexicalForm that
subproperties rdfs:label; it has a cardinality of precisely-1.
RESOLVED: to consult w/ Princeton team (thru brian) re requirements on
glossary entry, does it need xml literals?
RESOLVED: remove meronym superproperty [pending review from aldo]
RESOLVED: not to add a verb group for now
RESOLVED: to include some basic OWL assertions in schema before 1st WD

=================

present: guus, brian, andreas, danbri

guus: if we can find some external people, eg. students, to progress
this...

brian: yesterday I thought we decided it was my problem

guus: ok

brian: useful to walkthrough...

starting w/ diagram in the document 

guus: my main problem is with notion of wn:WordSense
...in mine I'd made it a bnode, without a URI. 

danbri: whether bnodes is orthogonal to vocab design

guus: wordnet has IDs some things, but not for word senses
...a compound key approach

...so if you want URIs for/from word senses,  you need to compose one 
somehow.

brian: we have a general issue, against all of this, which is "What URIs 
to use?". Why special here?

guus: generally you'd assume some princeton based identifier

...dan's concerned that numeric IDs not v usable


guus: could assume URIs for synsets are composed of princeton base URI
then first word from db-ordered synset, then '-' then identifier.

brian: ...

danbri: If we expect this design to morph into one that does
nouns-as-classes,
we need to think about prettyness in the rdf/xml syntax

guus: re synsets... re Bank...

andreas: direct link from word to synset, or indirectly?

...

brian: we set out with a goal of just representing the lexical form

....

discussion of metamodel based approach

guus: we've said synset is subclass of class; hypernym is subproperty of 
rdfs:subclassof. 

brian:
Core of structure is synssets, collections of words w/ similar meaning.
they can be typed (noun, verb, ...). 'bank' has many senses;
bank-as-finaincial-institution is 1 word sense. bank-at-side-of-river is
another wordsense for 'bank'.

eg. 'cat'
a word written as 'cat'
there's a sense of word as cat; ther's another which is an abbrev for
caterpillar truck.

there are relationships between word senses, synonym, antonym etc.
and between synsets, like hyponym etc.

guus, it would help, if for the lexical rep, if we could generate uris 
with some human readability.

danbri's proposal "take the word('s lex form), and the sense number,
joined by '-'.

guus 'for every word sense, you have  a sense number'

each word sense in a synset has a sense number; they're also ordered.

Film-5, isn't 5th word in a synset but the 5th sense of 'film'.

guus: main problem is identifying synsets, which have ugly numeric ids
...the things in a synset are ordered; some people take the first one.

guus: we are prepared to compromise the purity of having a lexical
representation by taking into account usability of URI structures we
invent.


guus: tricky bit w/ rdf ... is ordering information

...my current rep nor this one doesn't handle ordering

brian: one issue i have on my list.... there's a backbone structure,
...given this info, you can generate a whole load of other stuff, eg.
inverse 
relations. It'd be useful to be able to define inverse properties, but
that 
doesn't mean that we populate the triples in princeton_rdf.tar.gz or
whatever
...ie which triples do we want as base vs inferred?


DECISION: we will strive for human-friendly URIs (where we have them)
ACTION: danbri propose some URI formatting rules for word senses and
synsets


guus: we can define some test cases here, can ask our implementor; to
check 
we have genuinely unique URIs.

ACTION: guus make tests on URI labels once suggested



[discussion ... of multilingual labels

princeton wordnet is english; doesn't make language distinctions

guus: I'd assumed that we'd default to rdfs:label for cat, dog, film
etc.

danbri: I like naming relationships and making them subproperty of
rdfs:label

guus: ah, didn't realise you had a separate resource for Word

brian: I was asking earlier if we really need it

danbri: is wn:lexicalForm OWL Functional?

brian: yup

brian: when we talk about a word, is the word 'chat'(en) same as
'chat'(fr)

...I put it in as I wasn't sure if we need the indirection or not

...also somewhat historical, i had it in to avoid confusion between Word 
and Word sense

danbri: I like having it in there


guus: then we need to think abotu URIs for Words

danbri: Is wn:lexicalForm inverse-functional as well?

brian: depends on what we do wrt language.

guus: dropping Word would simplify the model

brian: doubles the triples, bloat factor, but you might use it for 
talking about Words

danbri: is this one of those cases where we could have it in our model,
but 
ship only the more concise shortcut representation?

guus/brian: yup. could keep it in the diagram but shaded out


guus: brian, your pref is to have a custom property for wn literal, and 
not use rdfs:label

guus: is precisely once functioanlity; every word sense has exactly 1
label in 
any given language.

danbri: could use OWL's class specific constraints?

guus: also visualization tools make use of rdfs:label

RESOLVED: we'll define a custom property wn:lexicalForm that
subproperties rdfs:label; it has a cardinality of precisely-1.

Issues discussion:

use of xml literal, eg. for glossary enty, lexical form.

danbri: let's do whatver skos does (or vice versa) 

guus: no indication now that we need anything more than plain literals

RESOLVED: to consult w/ Princeton team (thru brian) re requirements on
glossary entry, does it need xml literals?

Guus: maintainance... versioning?

danbri: this may be a difference between lexical and class-centric
represntations; former doesn't involve changing the namespace (much)

guus: they have a mappign table of identifiers between versions

brian: I think there are now synset IDs (@@check)

ACTION andreas Investigate Wordnet maintainance policy re synset IDs


Q: Do we want a meronym superproperty?

brian: it's complicated, I don't understand wordnet 2.0 meronyms

...there are various subproperties of meronym

...member, substance, part
...but also meronymOf

guus: no

brian: yes!

guus: mm specifies that the second synset is a member meronym

...assuming this is the case, I reckon not a superproperty

[...]

brian: guus is right... 

guus: if other people want the superproperty, ... hmm let's be minimal

RESOLVED: remove meronym superproperty [pending review from aldo]

guus: would be good if Aldo and Nicola could look at this topic

ACTION: brian ask Aldo to review meronym superproperty decision


Q: There is a concept of a group of verbs, but no way to name of refer
to a 
group of verbs. Invent verb group class?

brian: currently there is no such class, it's minimal

danbri: what's an example group of verbs? any use cases?

brian: see Prolog description. Specifiees verb synsets similar in
meaning

danbri: next layer up of clustering above synsets?

brian: cluster of synsets, presumably w/ closure

guus: 'vgp' in the Prolog

brian: there is a concept specified there, a notion of Group, as you 
try to infer the abstract model

RESOLVED: not to add a verb group for now

danbri: what I think we decided earlier: "We expect future revisions of
this work to explore the elaboration of the lexical representation of
Wordnet by making explicit some relationship between nouns and RDF
classes. We have not decided that this is a design constraint on the
lexical representation, beyond favouring human-friendly URIs (which
might be usable in RDF/XML typedNode XML elements); we have not
committed to a metamodelling approach that uses the same URIs for nouns
and their associated classes".

[nobody objects to this, but not much enthusiasm for making it a format
resolution.]


Q: How to represent the Fr relation?

brian: I didn't understand what it meant!
guus: me neither

ACTION: andreas investigate Sentence Frame / Fr relationship in the
Prolog, find examples etc.

Q: re new relations in Wordnet 2.0, ...

brian: someone needs to check the database files, not those in the
prolog

...go thru database schema files, see if anything missing in the prolog

guus: i've always assumed the Prolog represntation is complete

...and that therefore we can base tools from the Prolog

...if that assumption is not true, we're in trouble

guus: shoudl we explicitly record that assumption


brian: is there any mention in the Prolog of stable synset IDs

danbri: re questions for Princeton, ask on Wordnet users list rather
than direct 
to Princeton, as many others can answer those Qs

brian: ok

re tag-count... there is a corpus that's counted. It changes release to
release.
Student project work found the number in the db not so useful. Wanted
more of a 
relative frequency count. 

ACTION: guus ask Jan W to write Prolog transformation into RDF/XML

guus: would be good to have test cases

danbri: could use OWL for data integrity checks

brian: no OWL in the schema yet

guus: I only did stuff that was clear from the schema
(symmetric/reflexive etc)

danbri: agree symetric useful, but can we use it for data integrity
checking? (as can w/ Functional etc)

...

general agreement to get the OWL in there, but when?

brian: is it essential for first WD?

guus: it is essential for symmetric properties

danbri: worth doing, yup

brian: ok

guus: i've listed most of these 

RESOLVED: to include some basic OWL assertions in schema before 1st WD

guus: we can also maybe make transitive closure triple dump


ADJOURNED for what remains of lunch.
Received on Tuesday, 2 November 2004 13:34:34 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:17:13 GMT