Fwd: Something on grammar from Dave Raggett on 2021-01-11 (public-cogai@w3.org from January 2021)

From: Dave Raggett <dsr@w3.org>
Date: Mon, 11 Jan 2021 15:19:02 +0000
To: public-cogai <public-cogai@w3.org>
Message-Id: <C68AA875-1ED0-41E4-98C4-C0D3C20B2BFB@w3.org>
For archiving ...

> Begin forwarded message:
> 
> From: Christian Chiarcos <christian.chiarcos@web.de>
> Subject: Something on grammar
> Date: 11 January 2021 at 14:43:47 GMT
> To: "Dave Raggett" <dsr@w3.org>
> Cc: "Christian Chiarcos" <christian.chiarcos@googlemail.com>
> 
> Hi Dave,
> 
> thanks a lot for your talk. A lot of things I've been working on since my studies ;) My knowledge here isn't fully up to date, but let me share a few pointers.
> 
> As for my own work, I worked a little bit on RRG parsing (which is a semantics-driven construction grammar), but just as a representation formalism, not as an actual parsing framework. The reason is mostly that we don't have the lexical resources (rules and dictionaries). And this is the problem for all these approaches. There are rumors about rule-based RRG systems, and even applications of such, but they are proprietary (if operational, at all), and everything published was based on either dumbing down the grammar or just mapping from other structures to RRG parses (as we did, too). As far as I can tell, none of these systems is context-aware or provide any other means of disambiguation than by (context-free) frequency.
> 
> However, having an annotated corpus (there was none for, say, RRG, but our transformation-based approach was an effort to create one) can be a basis for mining lexicalization patterns or rules, and then, for "real" RRG parsing. Similarly for other formalisms of gramar and semantics. If some of that can be reused for your experiments, please get in touch.
> 
> As for classical linguistic frameworks like that, you might want to look into the following:
> 
> - Montague/dynamic semantics: direct mapping from language to semantics, basically without independent syntax (B. H. Partee, Herman Hendriks 1997: Montague Grammar. In: Handbook of Logic and Language, eds. J.F.A.K. van Benthem and A. G. B. ter Meulen Elsevier/MIT Press, pp. 5–92.). I am not aware of any direct technical operationalization. The closest thing is probably CCG (http://groups.inf.ed.ac.uk/ccg/).
> 
> - DRT / SDRT (https://plato.stanford.edu/entries/discourse-representation-theory/). Features full-fledged semantic-syntax mapping, but has *never* been directly operationalized. The bottle neck is knowledge acquisition. However, there is an older (S)DRT parser by Johan Bos, Groningen, Boxer: https://gmb.let.rug.nl/software.php. It does produce DRT annotations, but it does, however, not perform DRT parsing, but just maps a number of other component to DRT. Also, it's context-free. That won't solve the backtracking issue. It's still interesting, also because of the existence of RDF wrappers around it: https://www.istc.cnr.it/en/news/fred-and-tipalo-natural-language-rdfowl
> 
> General challenge:
> - Complexity of symbolic parsing. Notoriously slow when it comes to larger dictionaries (This is sometimes [intentionally?] overlooked, but I got it confirmed by Stefan Müller, major figure in the HPSG community, p.c.; author of the TRALE system: https://hpsg.hu-berlin.de/Software/)
> - Coverage of symbolic parsing. The best HPSG grammars for English cover maybe 85% of the input tokens (with about 35 years development time; some pointers under https://matrix.ling.washington.edu/index.html) There are ways to circumvent this by using less sophisticated components as fall-back solutions, but then, you loose the power of your grammar. 85% coverage doesn't sound drastic, but if every sentence contains about 15 words, you can expect every second sentence to contain an out-of-vocabulary word. (And it comes without contextual disambiguation, but returns all possible analyses.)
> 
> Somewhat closer to a syntax-free approach are UCCA (https://universalconceptualcognitiveannotation.github.io/) and AMR (https://amr.isi.edu/). These are semantic formalisms. But their parsing is not rule-based but neural and they are not reversible. However, AMRs are used for NLG and MT, but using neural language models that recover pieces of information not preserved in the AMR parse from the context. I'm currently looking into this direction as a means for abstractive text summarization (as a follow-up to https://www.advancedsciencenews.com/betas-draft-a-human-review-of-a-machine-generated-book/, which was a rather simple baseline system).
> 
> And of course, there is some work on learning lexicalization patterns that directly link knowledge graphs (and even RDF) and language, e.g., http://sargraph.dfki.de/. I am not sure to what extent these are reversible.
> 
> Hope that helps ;)
> 
> Best,
> Christian
> 
> PS: Minor remark on your examples
> 
> The German one is not quite right ;)
> 
> It is either
> "Ingrid sah Peter Hans schwimmen lassen" (main clause)
> 
> or
> 
> "dass Ingrid Peter Hans schwimmen lassen sah" (relative clause, not without complementizer!)
> 

Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
W3C Data Activity Lead & W3C champion for the Web of things
Received on Monday, 11 January 2021 15:19:07 UTC