Re: Something on grammar from Dave Raggett on 2021-01-21 (public-cogai@w3.org from January 2021)

From: Dave Raggett <dsr@w3.org>
Date: Thu, 21 Jan 2021 11:41:10 +0000
To: Christian Chiarcos <christian.chiarcos@web.de>
Cc: public-cogai <public-cogai@w3.org>
Message-Id: <F15D69FA-EB52-4EEF-8129-3C50608F49F8@w3.org>

Hi Christian,

My current parser is in JavaScript and indeed has a limited scope. The idea being to evolve it in keeping with the expanding scope of the series of NLP demos. I started with a single phrase for a towers of Hanoi demo, and then extended the parser to support the dinner ordering demo. That was at the beginning of last summer. After that I was had to focus on EU project deliverables, and am only now picking it up again.

It seems that a few rules go a long way as can be seen in the following parsing demo:

https://www.w3.org/Data/demos/chunks/nlp/parsing/ <https://www.w3.org/Data/demos/chunks/nlp/parsing/>

This declares the lexicon in the facts graph. The rules graph is empty as it isn’t used as yet. The parsing rules are natively encoded in JavaScript, see:

https://www.w3.org/Data/demos/chunks/nlp/parsing/parsing.js <https://www.w3.org/Data/demos/chunks/nlp/parsing/parsing.js>

The idea is to get experience with parsing before investing in the effort in designing and implementing a shift-reduce rule language, as will be needed to learn grammar from examples.

In the above demo, you select the utterance to parse using the left/right buttons and then click “parse utterance”. The log field shows the sequence of steps followed by the parser and the corresponding state of the shift-reduce queue. Each entry is a chunk, e.g. “np _:4 {det the; noun sea, bass}” which denotes a noun phase (np) with a determiner and a compound noun. “_:4” is an internally generated identifier for the chunk.

In my local version of this demo, I just added “looking” to the lexicon as a verb, and tried “what are you looking for”. This yielded:

0) vp _:2 {verb are, looking; subject _:1; object _:3}
np _:1 {pron what}
np _:3 {pron you}
1) pp {prep for}

If parsing succeeds the queue should only contain a single item. In this case, it has two items, showing that the parser didn’t know how to reduce the empty preposition. This could be easily fixed with a new reduce rule that applies when the preposition (pp) isn’t followed by a noun phrase (np).

The JavaScript parser isn’t smart about how it checks and applies rules. With a rule language, I would envisage compiling rule conditions into a discrimination network that updates the list of applicable rules whenever the queue state changes. You can think of this as a lattice where changes to the input ripples through the lattice to update its output. If more that one rule is applicable, one has to be selected in some manner. This situation could be used to trigger learning, and is something to be explored later.

Of course prepositions don’t always attach to the thing before them. This is where I plan to use semantic search via launching asynchronous threads to evaluate potential choices and select the winner. The parser needs to identify such choices. This is something I want to explore in this year’s work.

Combining syntactic and semantic processing is akin to an informed best-first strategy, where some choices are left to be lazily resolved at a later stage. There is no backtracking as such, but if things go wrong (e.g. the queue can’t be reduced), this can trigger reflective reasoning about the syntactic and semantic rules.

Some of the processing can be done in parallel, and, in principle, benefit from hardware acceleration. It would be great to better understand how the brain is doing this from a neuroscience perspective, but hopefully, a computational model will provide insights that can feed into such research, just as ACT-R has inspired work on computational models of pulsed neural networks (see Chris Eliasmith’s work on semantic pointers).

My guess is that the brain is very effective in a best-first strategy for NLU, avoiding the need to fully explore the space of syntactic possibilities. One way to test this is through measurements of the time taken to parse utterances of different lengths. I could mimic that for my web-based demo to show how well it performs for different kinds and lengths of utterances. I am sure that there will be some corner cases where parsing either fails or takes too many steps, e.g. so called garden path sentences.

In the meantime, my priority is to focus on syntax-semantic mapping rules and how these can be applied to both NLG and NLU. This is a multi-year effort, and I realise that the onus is on me to show that cognitive NLP is an effective approach for practical applications, and moreover, can mimic how children learn language.

Best regards,

Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
W3C Data Activity Lead & W3C champion for the Web of things

Received on Thursday, 21 January 2021 11:41:46 UTC