Re: Something on grammar from Dave Raggett on 2021-01-11 (public-cogai@w3.org from January 2021)

From: Dave Raggett <dsr@w3.org>
Date: Mon, 11 Jan 2021 15:50:25 +0000
To: public-cogai <public-cogai@w3.org>
Message-Id: <1663CAF1-AADC-41E0-A1E4-B4DB8A788EAA@w3.org>
Hi Christian,

Many thanks for your feedback. I have moved it here to archive the thread.

Previous work on semantics has largely focused on logic-based approaches, however, that is a poor fit to how people reason, as noted by Philip Johnson-Laird. The chunk representation derives from work by John Anderson on ACT-R, and takes an informal approach to semantics based upon applying production rules to graphs.

Chunks makes it relatively easy to create knowledge graphs for specific domain scenarios such as the restaurant demo. The lexicon then has to be developed to match the vocabulary.  I am planning on re-using the restaurant dialogue and its variants to explore the syntax-semantic mapping challenge, along with some additional examples that focus on specific issues for disambiguation.

My investigation last year showed that it is much better to work on end-to-end communication of meaning than on NLU or NLG alone. Shift-reduce parsing proved to be easier to work with than I expected. In the short term, I have hand coded the corresponding rules, but this suggests the idea of a future demo that learns the rules from dialogue examples.

Syntactic and semantic processing need to proceed incrementally and parallel. The parser devolves disambiguation as a means to avoid backtracking. Demonstrating how that works in practice is a key goal for the work in 2021. The basic idea is to launch a number of threads that are executed concurrently to evaluate alternatives and apply the winner to the phrase structure graph. 

The functional constraints on syntax-semantic mappings to support stochastic selection, reversibility and compositionally are good enough to allow me to develop proposals for a rule language given a set of examples to work from. I hope to find a clean syntax, and to then implement a parser and execution engine as I did for chunks rules.

I will also need to work on the means to identify and score alternatives for both NLG and NLU along with keeping track of working memory for each concurrent thread of execution.

Thanks for the note about the missing “dass”. It was indeed present in the original, along with “dat” for the Dutch version. I will update the slides on the web to fix that.

I will also follow up the links you provided and see what ideas emerge.

Kind regards,
Dave

> On 11 Jan 2021, at 15:19, Dave Raggett <dsr@w3.org> wrote:
> 
>> Begin forwarded message:
>> 
>> From: Christian Chiarcos <christian.chiarcos@web.de <mailto:christian.chiarcos@web.de>>
>> Subject: Something on grammar
>> Date: 11 January 2021 at 14:43:47 GMT
>> To: "Dave Raggett" <dsr@w3.org <mailto:dsr@w3.org>>
>> Cc: "Christian Chiarcos" <christian.chiarcos@googlemail.com <mailto:christian.chiarcos@googlemail.com>>
>> 
>> Hi Dave,
>> 
>> thanks a lot for your talk. A lot of things I've been working on since my studies ;) My knowledge here isn't fully up to date, but let me share a few pointers.
>> 
>> As for my own work, I worked a little bit on RRG parsing (which is a semantics-driven construction grammar), but just as a representation formalism, not as an actual parsing framework. The reason is mostly that we don't have the lexical resources (rules and dictionaries). And this is the problem for all these approaches. There are rumors about rule-based RRG systems, and even applications of such, but they are proprietary (if operational, at all), and everything published was based on either dumbing down the grammar or just mapping from other structures to RRG parses (as we did, too). As far as I can tell, none of these systems is context-aware or provide any other means of disambiguation than by (context-free) frequency.
>> 
>> However, having an annotated corpus (there was none for, say, RRG, but our transformation-based approach was an effort to create one) can be a basis for mining lexicalization patterns or rules, and then, for "real" RRG parsing. Similarly for other formalisms of gramar and semantics. If some of that can be reused for your experiments, please get in touch.
>> 
>> As for classical linguistic frameworks like that, you might want to look into the following:
>> 
>> - Montague/dynamic semantics: direct mapping from language to semantics, basically without independent syntax (B. H. Partee, Herman Hendriks 1997: Montague Grammar. In: Handbook of Logic and Language, eds. J.F.A.K. van Benthem and A. G. B. ter Meulen Elsevier/MIT Press, pp. 5–92.). I am not aware of any direct technical operationalization. The closest thing is probably CCG (http://groups.inf.ed.ac.uk/ccg/ <http://groups.inf.ed.ac.uk/ccg/>).
>> 
>> - DRT / SDRT (https://plato.stanford.edu/entries/discourse-representation-theory/ <https://plato.stanford.edu/entries/discourse-representation-theory/>). Features full-fledged semantic-syntax mapping, but has *never* been directly operationalized. The bottle neck is knowledge acquisition. However, there is an older (S)DRT parser by Johan Bos, Groningen, Boxer: https://gmb.let.rug.nl/software.php <https://gmb.let.rug.nl/software.php>. It does produce DRT annotations, but it does, however, not perform DRT parsing, but just maps a number of other component to DRT. Also, it's context-free. That won't solve the backtracking issue. It's still interesting, also because of the existence of RDF wrappers around it: https://www.istc.cnr.it/en/news/fred-and-tipalo-natural-language-rdfowl <https://www.istc.cnr.it/en/news/fred-and-tipalo-natural-language-rdfowl>
>> 
>> General challenge:
>> - Complexity of symbolic parsing. Notoriously slow when it comes to larger dictionaries (This is sometimes [intentionally?] overlooked, but I got it confirmed by Stefan Müller, major figure in the HPSG community, p.c.; author of the TRALE system: https://hpsg.hu-berlin.de/Software/ <https://hpsg.hu-berlin.de/Software/>)
>> - Coverage of symbolic parsing. The best HPSG grammars for English cover maybe 85% of the input tokens (with about 35 years development time; some pointers under https://matrix.ling.washington.edu/index.html <https://matrix.ling.washington.edu/index.html>) There are ways to circumvent this by using less sophisticated components as fall-back solutions, but then, you loose the power of your grammar. 85% coverage doesn't sound drastic, but if every sentence contains about 15 words, you can expect every second sentence to contain an out-of-vocabulary word. (And it comes without contextual disambiguation, but returns all possible analyses.)
>> 
>> Somewhat closer to a syntax-free approach are UCCA (https://universalconceptualcognitiveannotation.github.io/ <https://universalconceptualcognitiveannotation.github.io/>) and AMR (https://amr.isi.edu/ <https://amr.isi.edu/>). These are semantic formalisms. But their parsing is not rule-based but neural and they are not reversible. However, AMRs are used for NLG and MT, but using neural language models that recover pieces of information not preserved in the AMR parse from the context. I'm currently looking into this direction as a means for abstractive text summarization (as a follow-up to https://www.advancedsciencenews.com/betas-draft-a-human-review-of-a-machine-generated-book/ <https://www.advancedsciencenews.com/betas-draft-a-human-review-of-a-machine-generated-book/>, which was a rather simple baseline system).
>> 
>> And of course, there is some work on learning lexicalization patterns that directly link knowledge graphs (and even RDF) and language, e.g., http://sargraph.dfki.de/ <http://sargraph.dfki.de/>. I am not sure to what extent these are reversible.
>> 
>> Hope that helps ;)
>> 
>> Best,
>> Christian
>> 
>> PS: Minor remark on your examples
>> 
>> The German one is not quite right ;)
>> 
>> It is either
>> "Ingrid sah Peter Hans schwimmen lassen" (main clause)
>> 
>> or
>> 
>> "dass Ingrid Peter Hans schwimmen lassen sah" (relative clause, not without complementizer!)
>> 
> 
> Dave Raggett <dsr@w3.org <mailto:dsr@w3.org>> http://www.w3.org/People/Raggett <http://www.w3.org/People/Raggett>
> W3C Data Activity Lead & W3C champion for the Web of things 
> 
> 
> 
> 

Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
W3C Data Activity Lead & W3C champion for the Web of things
Received on Monday, 11 January 2021 15:50:30 UTC