No call Feb 8, brief status report on NLP

I am not planning on organising a telecon this Monday (Feb 8), but if anyone else would like to chair that would be fine.

In the meantime, I would like to update you on the progress I am making on cognitive natural language processing.

I am working on a demo that combines natural language understanding and generation. The idea is to show what processing takes place at the level of words, syntax and semantics, by analogy to a factory production line, in which information is elaborated and transformed as it moves down the line. The incremental processing will be shown as a dynamically generated HTML table with columns for words, syntax and semantics, with a new row added for each word as it is processed.

The hypothesis is that natural language can be understand incrementally word by word without backtracking, using concurrent processing of words, syntax and semantics. The syntactic interpretation will be refined as processing proceeds, e.g. does a given noun phrase form part of a list of referents for the object of a verb, or is it actually an indirect object? For instance consider “Mary” below:

 John invited Janet Mary

This doesn’t make sense unless you assume that “Janet Mary” is a compound name like “Mary-Sue”, however, that’s unlikely if you haven’t heard “Janet Mary” used that way before. We therefore need to wait until we’ve looked at the next word.

 John invited Janet Mary and Sue to a party

The word “and” in this case makes it clear that this is a list of invitees.

 John gave the dog a bone

This is similar to the previous example in that the verb is followed by two noun phrases. It has the natural interpretation that a bone was given by John to the dog, i.e. “a bone” is the indirect object. These examples show that the intended syntactic structure depends on semantic knowledge shared by both speaker and listener. That knowledge can be modelled in terms of chunk graphs, which you can think of as an ontology.  I am therefore exploring what knowledge is needed to process a suite of utterances as used in the demo.

Basic English grammar has many such details, but for each case, the processing at each stage of the production line can be broken down into simple rules, along with the means to invoke corresponding graph algorithms. The demo will help to identify the different kinds of rules and graph algorithms needed for a selection of English utterances. This will then pave the way for an exploration of how language is learned.

Note that I discarded punctuation in the examples given that commas are silent in speech. The processing of words identifies their properties, e.g. part of speech, number, gender, and meaning. This includes semantic search through the context and dialogue history. The syntactic processing can be modelled in terms of shift-reduce parsing. The semantic processing includes declaring entities, e.g. a named person, identifying references to previously introduced entities, mapping syntactic arguments to semantic arguments, and searching for the meaning of prepositional phrases, etc. This builds up complex graph representations of the meaning.

If anyone is interested in helping with the analysis and the demo, please let me know.

Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
W3C Data Activity Lead & W3C champion for the Web of things 

Received on Sunday, 7 February 2021 18:25:40 UTC