Word sense discrimination

I hope you have had a pleasant summer.  I’ve been doing some background reading, and am getting closer to starting some further implementation experiments. The aim is to develop a good enough solution for transforming natural language into graph representations of the meaning. By good enough, I mean good enough to enable work on using natural language in experiments on human-like reasoning and learning.

Natural language understanding can be broken down into sub-tasks, such as, part of speech tagging, phrase structure analysis, semantic and pragmatic analysis. The difficulties mainly occur with the semantic processing. Many words can have multiple meanings, but humans effortlessly understand which meaning is intended in any given case. Semantic processing is also needed to figure out prepositional attachments, and  determine what pronouns, and other kinds of noun phrases, are referring to.

Two decades ago work on word sense disambiguation focused on n-gram statistics for word collocations. More recently, artificial neural networks have proved to be very effective at unsupervised learning of statistical language models for predicting what text is likely to follow on from a given text passage. Unfortunately, marvellous as this is, it isn’t transferrable to tasks such as word sense disambiguation and measuring semantic consistency for deciding on prepositional attachments etc.

I am therefore still looking for practical ways to exploit natural language corpora to determine word senses in context. The intended sense of a word is correlated to the words with which it appears in any given utterance. The accompanying words vary in their specificity for discriminating particular word senses. However, strongly discriminating words may be found several words away from the word in question. A simple n-gram model would require an impractical amount of memory to capture such dependencies.  We therefore need a way to learn which words/features to pay attention to, and what can be safely forgotten as a means to limit the demand on memory.

I rather like the 1995 paper by David Yarowksy “Unsupervised words sense disambiguation rivalling supervised methods”. This assumes that words have one sense per discourse and one sense per collation, and exploits this in an iterative bootstrapping procedure. Other papers exploit linguistic resources like WordNet. I am now hoping to experiment with Yarowksy’s ideas using loose parsing for longer range dependencies, together with heuristics for discarding collocation data with weak discrimination.

I’ve downloaded free samples of large corpora from www.corpusdata.org as a basis for experimentation. Each word is given with its lemma and part of speech, e.g. "announced", "announce", "vvn”. This will enable me to apply shift-reduce parsing to build phrase structures as an input to computing collocations. Further work would address the potential for utilising prior knowledge, e.g. from WordNet, and how to compute measures of semantic consistency for resolving noun phrases and attachment of prepositions as verb arguments.  An open question is whether this can be done effectively without resorting to artificial neural networks.

Anyone interested in helping?

Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
W3C Data Activity Lead & W3C champion for the Web of things 

Received on Monday, 23 August 2021 14:52:32 UTC