Re: Bootstrapping cognitive NLP from Dave Raggett on 2021-07-14 (public-cogai@w3.org from July 2021)

From: Dave Raggett <dsr@w3.org>
Date: Wed, 14 Jul 2021 15:02:59 +0100
To: Christian Chiarcos <christian.chiarcos@gmail.com>
Cc: public-cogai <public-cogai@w3.org>
Message-Id: <8DD77AED-57F8-4241-AD4F-85252AF4B9F2@w3.org>
Hi Christian,

Thanks for your input which is much appreciated. I am inspired by evidence from gaze tracking data of people reading, and studies on thematic and taxonomic learning. These show that people are for the most part able to make sense of what they read incrementally word by word, and that we are able to exploit co-occurrence statistics together with other knowledge.

The number of senses for a word can perhaps be considered in terms of how to distinguish different clusters in the co-occurrence space. The algorithm for word sense induction can in principle be evolved to mirror word senses in dictionaries. This could exploit statistical matching against the text descriptions of word senses in dictionaries, together with information relating word senses as given by WordNet, which can help to overcome the problems with sparse data.

Earlier work using hidden markov models and ngrams gave disappointing results for word sense disambiguation. Why is that? Perhaps the window size was too small? Perhaps the algorithms weren’t very effective on focusing on which cues are important in any given case?

By contrast BERT has a richer model of attention with a longer range, see, e.g. "What Does BERT Look At? An Analysis of BERT's Attention”,  Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D. Manning - https://arxiv.org/abs/1906.04341 <https://arxiv.org/abs/1906.04341>

They note for example:

> BERT's attention heads exhibit patterns such as attending to delimiter tokens, specific positional offsets, or broadly attending over the whole sentence, with heads in the same layer often exhibiting similar behaviors. We further show that certain attention heads correspond well to linguistic notions of syntax and coreference. For example, we find heads that attend to the direct objects of verbs, determiners of nouns, objects of prepositions, and coreferent mentions with remarkably high accuracy.


This points to the importance of phrase structure in respect to co-occurrence statistics, and opportunities for exploring different ways to capture them. I am also intrigued by the possibilities for combining statistical approaches with inference over meanings with a view to “explaining” the intent of each word in a given utterance. This assumes eschewing formal semantics in favour of informal approaches to meaning in terms of operations over graphs.

Do you know how WordNet was created?  Was it a mainly manual effort by skilled lexicographers or did they make use of machine learning?

Best regards,

Dave

> On 14 Jul 2021, at 12:06, Christian Chiarcos <christian.chiarcos@gmail.com> wrote:
> 
> Hi Dave, dear all,
> 
> apologies for not following up too closely. I'm having some administrative trouble since a few months, and until this is overcome, I would switch to lurking mode, mostly. (Well, I did so already.) 
> 
> By mainstream NLP researchers, Word Sense Disambiguation is considered a hard, but largely artificial problem because there is too little agreement on sense definitions across resources and too little sense-annotated resources available to apply machine learning in a meaningful way. The classical Lesk algorithm seems to be reminiscent of your ideas, and it works nicely -- as long as examples and definitions provided in the sense inventory are sufficiently representative (which they are not). Anyway, you might want to replicate Lesk as proof of principle. It's still considered a seminal work: https://dl.acm.org/doi/10.1145/318723.318728 <https://dl.acm.org/doi/10.1145/318723.318728>. This uses word overlap and suffers from data sparsity. A more modern approach following Lesk's spirit would probably be to induce embeddings for word senses (cf. https://aclanthology.org/P15-1173/ <https://aclanthology.org/P15-1173/>, they call word senses "lexemes"), and then to compare them with the (aggregate) context embeddings. This operates on word embeddings; not sure how to scale this to contextualized embeddings as those produced by BERT etc. -- BERT would be great to derive "real" sense embeddings if we had a significant corpus annotated for word senses. Well, we don't really have that. (OntoNotes [https://catalog.ldc.upenn.edu/LDC2013T19 <https://catalog.ldc.upenn.edu/LDC2013T19>] is the closest thing, but they had to simplify WordNet sense distinctions in order to annotate them in a reliable way.)
> 
> As for cognitive plausibility, Lesk isn't incremental, so its way of processing is different from what humans do. But the underlying mechanism follows a similar intuition as you had, and it would be possible to make it incremental by just looking into the preceding context. However, it doesn't have a backtracking mechanism, and that would be needed unless we're happy with all text-initial (context-free!) words being misclassified.
> 
> As for the machine-readable dictionaries, there is very limited data available with proper sense definitions. WordNets are (http://compling.hss.ntu.edu.sg/omw/ <http://compling.hss.ntu.edu.sg/omw/>). Maybe the Apertium data would work for you (https://github.com/acoli-repo/acoli-dicts/tree/master/stable/apertium/apertium-rdf-2020-03-18 <https://github.com/acoli-repo/acoli-dicts/tree/master/stable/apertium/apertium-rdf-2020-03-18>). It doesn't have sense definitions, but just assumes to have one sense per translation pair.
> 
> Best,
> Christian
> 
> Best,
> Christian
> 
> Am Mo., 12. Juli 2021 um 15:50 Uhr schrieb Dave Raggett <dsr@w3.org <mailto:dsr@w3.org>>:
> If anyone is has time today I would like to chat about ideas for working on cognitive natural language understanding (NLU).
> 
> There has been a lot of coverage around BERT and GPT-3 for NLP with their impressive ability for generating text as a continuation to a text passage provided by the user. Unfortunately the hype is overblown, as the lack of real semantics is soon apparent when you ask for the sum of two large numbers, or who is the US President in 1650 (before the United States was founded). GPT-3 doesn't know the limitations of its knowledge and fails to say it doesn't know the answer to questions.
> 
> I am interested in ways to bootstrap NLU using statistical analysis of text corpora in conjunction with machine readable natural language dictionaries, WordNet’s thesaurus, and manually provided taxonomic knowledge.
> 
> The starting point is to be able to tag words with their part of speech, e.g. adjective, noun, verb. This enables loose parsing to identify phrase structures, which in turn can be used for co-occurrence statistics. By matching the statistics for a given text passage to dictionary definitions, we can using this to predict word senses in context.
> 
> This can be considerably improved by introducing knowledge about the relationship between words with related meanings from thesauri and taxonomies, e.g. knowing that dogs are animals helps with a dictionary definition for “collar” expressed in terms of animals, as it explains the use of “dog collar” etc.
> 
> My hunch is that combining multiple kinds of information in this way can support semantic understanding provided that that is expressed in terms of word senses and human-like reasoning. It may leave ambiguities where agent is unsure, e.g. how do you know that dog is a subclass of animal rather than a related peer concept? However, this still speeds learning through the role of prior knowledge.
> 
> Researchers have found that we learn associations between concepts whose labels directly co-occur, and subsequently between taxonomically related concepts whose labels share patterns of co-occurrence. Children are good at the former, but poor at the latter, whilst adults are good at both.
> 
> The challenge is to turn these high level ideas into concrete experiments with running code. A related challenge is to obtain machine interpretable natural language dictionaries.
> 
> Updated call details are given at:
> 
>  https://lists.w3.org/Archives/Member/internal-cogai/2021Jun/0000.html <https://lists.w3.org/Archives/Member/internal-cogai/2021Jun/0000.html> 
> 
> Looking forward to talking with you!
> 
> Dave Raggett <dsr@w3.org <mailto:dsr@w3.org>> http://www.w3.org/People/Raggett <http://www.w3.org/People/Raggett>
> W3C Data Activity Lead & W3C champion for the Web of things 
> 
> 
> 
> 

Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
W3C Data Activity Lead & W3C champion for the Web of things
Received on Wednesday, 14 July 2021 14:03:06 UTC