Re: Linguistics for the Age of AI from Amirouche BOUBEKKI on 2021-10-23 (public-aikr@w3.org from October 2021)

From: Amirouche BOUBEKKI <amirouche@hyper.dev>
Date: Sat, 23 Oct 2021 15:45:36 +0000
To: Dave Raggett <dsr@w3.org>
Cc: "public-aikr@w3.org" <public-aikr@w3.org>
Message-ID: <0icDQygUfGr7DES0e1B8JEY5rcHmXHfjAjJ8v-nI67tByJXmkn4_qq2K9QWslcET0gQkglIoD5lhvNk>
Hi Dave,

On Saturday, October 23rd, 2021 at 4:31 PM, Dave Raggett <dsr@w3.org> wrote:

> Hi Amirouche,
> 

> > On 23 Oct 2021, at 07:04, Amirouche BOUBEKKI <amirouche@hyper.dev> wrote:The whole book is open-access pdf available at:
> > 

> >  https://direct.mit.edu/books/book/5042/Linguistics-for-the-Age-of-AI
> 

> Thanks for the pointer, which is very timely given my work on human-like natural language processing. I am taking the unpopular knowledge-rich approach, given the need for cognitive agents to understand what people mean as part of human-machine collaboration, and likewise for agents to make themselves understood.

Good luck!

> The section (1.6.3) on unmotivated beliefs is good, as I hear some of these criticisms frequently from people who act as naysayers for work on human-like AI.

That is a good start, for advocating for NLU / Semantic / Ontologic / Knowledge-based systems. 


There is more arguments inside the chapter called 'Measuring Progress'. 


About that chapter, I just skimmed, but there is simpler way to evaluate a NLU system, but requires much more funding and work such as using building / adapting a deep NLU system to tutoring in a particular field, e.g. History of Europe, pick student before Master, offer the NLU system to half the student, and compare after the Master is complete the results with and without NLU tutor.

I am unsure why they chose those evaluation metrics that are as far as skimmed, narrow, given what the system can apparently achieve.

> I appreciated the comment in chapter 1: 
> 

> > Domain-specific NLU successes are often criticized for not being immediately applicable to all domains (under the pressure of evaluation frameworks entrenched in statistical NLP)
> 

> We need different ways to evaluate progress on cognitive agents, ways that focus on understanding and reasoning in respect to collaborative AI.
> 


Yes

> I want to use selected examples to explore the models and algorithms needed for natural language understanding in terms of mapping a sequence of words into chunk graphs that represent the informal natural language semantics, including context dependent fuzzy concepts as blends of discrete concepts.

That is addressed in chapters 4, 5, 6, 7. But they do not provide an extensive set of chunk frames such has Herrman Helbig does in MultiNet book. Context dependant in particular is addressed in Situational Reasoning and is where the application specific ontology come to play (AFAIU).

> 

> The knowledge bottleneck can be addressed by focusing on a) what knowledge is needed for limited applications,

Define limited application :)

> and b) mimicking how humans learn new words and ideas as a basis for incremental learning,

Incremental learning is also the mentioned in the first chapter 1, 2 and 3.

> and paving the way for crowd sourcing as a basis for teaching cognitive agents.

This makes me think we should hook Abstract Wikipedia / Wiki functions people into the loop.

> I am not optimistic about learning knowledge at scale automatically from textual resources, as that is likely to lead to a patchwork of knowledge rather than real understanding which coherently combines declarative and procedural knowledge.

Yes and no. For critical mission: no. For knowledge basic pre-semantic analysis and for the knowledge engineer it can and will be useful. They mention that their NLU or an NLU system could rely on a regular search engine, which does basic information retrieval. 


> 

> An open question is the applicability of existing resources for common sense knowledge graphs, see:
> 


Clearly existing resource will be useful at least for knowledge engineer to train the system, and possibly domain expert that teach the truthy things, but whether they can be used as-is without forking is another question. That is what they underscore in the book about CoreNLP, maybe building their own basic pre-semantic tooling would have costed less. Similarly, they have a lexicon that ressembles WordNet, but different, so it is prolly an complete rework or hard fork.

> https://arxiv.org/abs/2012.11490 https://usc-isi-i2.github.io/ISWC20/
> 

> Ilievski et al’s CSKG is publicly available as a tab separated value file which is about one gigabyte in size when uncompressed. That highlights the challenges for human understanding of large knowledge graphs, and points to opportunities for work on web-based tools and techniques for browsing and querying such graphs.
> 

> This is a long and complicated road to work on, but no pain, no gain as it were!

Yes

> 

> Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
> 

> W3C Data Activity Lead & W3C champion for the Web of things
Attachments

application/pgp-keys attachment: publickey_-_amirouche_hyper.dev_-_0x83B05530.asc
Received on Saturday, 23 October 2021 15:45:55 UTC