Re: Something on grammar from Dave Raggett on 2021-01-22 (public-cogai@w3.org from January 2021)

From: Dave Raggett <dsr@w3.org>
Date: Fri, 22 Jan 2021 10:50:31 +0000
To: Christian Chiarcos <christian.chiarcos@web.de>
Cc: public-cogai <public-cogai@w3.org>
Message-Id: <C82B1D10-C246-469C-8805-1F0DF83AD4DD@w3.org>

Further to the question on how humans process language when reading, I dug up a few references on eye movements for psycholinguistics:

“What eye movements can tell us about sentence comprehension”
Shravan Vasishth, Titus von der Malsburg and Felix Engelmann

> Eye movement data have proven to be very useful for investigating human sentence processing. Eyetracking research has addressed a wide range of questions, such as recovery mechanisms following garden-pathing, the timing of processes driving comprehension, the role of anticipation and expectation in parsing, the role of semantic, pragmatic, and prosodic information, and so on.

https://tmalsburg.github.io/VasishthEtAl2013.pdf <https://tmalsburg.github.io/VasishthEtAl2013.pdf>

“Eye Movements During Reading”
Jukka Hyönä, Johanna K. Kaakinen

… a brief introduction to research on eye movements during reading

https://link.springer.com/chapter/10.1007/978-3-030-20085-5_7 <https://link.springer.com/chapter/10.1007/978-3-030-20085-5_7>

“Eye-movement recording as a tool for studying syntactic processing in a second language: a review of methodologies and experimental findings”

Cheryl Frenck-Mestre

> The complex trace of saccades, fixations and regressions that the eyes make while taking in a line of text is unquestionably one of the richest accounts available as concerns the process of reading. Recording these jumps, stops and re-takes provides a to-the-letter, millisecond-precise report of the readers’ immediate syntactic processing as well as revisions thereof. In addition, the influence of innumerable factors – from low-level visual conditions to high-level pragmatic cues – on the reading process can be measured via this method, thereby rendering possible the testing of various psycholinguistic models of parsing and comprehension. For all of these reasons, eye-movement recording has become an invaluable tool in the study of how readers process text.

https://www.researchgate.net/publication/32222855_Eye-movement_recording_as_a_tool_for_studying_syntactic_processing_in_a_second_language_A_review_of_methodologies_and_experimental_findings <https://www.researchgate.net/publication/32222855_Eye-movement_recording_as_a_tool_for_studying_syntactic_processing_in_a_second_language_A_review_of_methodologies_and_experimental_findings>

Some highlights:
We generate a phonological representation when we are reading. This is the voice we hear in our heads as we read.
We peek ahead at the next word using non-foveal text information, and skip quickly over common short words such as prepositions.
We pause on words that require further processing, e.g. novel words or words with many meanings.
We pause at the end of a sentence, presumably to contemplate on the meaning of the sentence
We go back to problem words when we detect a parsing problem.
In addition, the morphology of words helps us to guess the meanings of words in terms of their components, and to relate words to their stems.
One example of a problematic sentence:

Since Jay always jogs a mile seems like a very short distance to him.

Where “a mile” seems to attach to “jogs” until we read the next word “seems” forcing us to reconsider by looking back to “jogs” to repair the syntactic structure and meaning. A comma or full stop after “jogs” avoids this ambiguity. This can be thought of as a case of belief revision rather than backtracking.

Another example by Sturt deals with reflexives:

The surgeon who treated Jonathan had pricked himself with a used syringe needle.

In the second sentence, the antecedent of himself/herself is always surgeon. Surgeons can certainly be women, but English (sadly) has a stereotypical bias toward male surgeons. Jonathan acts as a distractor, which is not the case if you replace Jonathan with Jenny as then himself clearly doesn’t relate to Jenny.

Sturt concluded that the parser carries out antecedent search using a syntactic constraint, initially ignoring the nonsyntactic cue of gender match between the distractor noun phrase and the reflexive, and only later on does gender match cause disruptions.

Another possible interpretation is that “had” could conceivably attach to either “surgeon” or to “Jonathan”, but the correct choice is the former, because Jonathan is the object of the first verb. This suggests that himself should be interpreted in terms of the phrase structure rather than just lexical distance.

All of this is consistent with a pipelined approach to natural language understanding, where processing occurs concurrently at different stages along the pipeline. This avoids backtracking, but includes a mechanism to reprocess text from a problem word when a problem is detected.

I find this exciting as there are many clues that point to the requirements for a functional model of NLU, NLG and language learning, and the challenge is to experiment with ideas for realising those requirements in a simple way. We will then be able to test how performance scales with the length of a sentence and other attributes.

Dave Raggett <dsr@w3.org> http://www.w3.org/People/Raggett
W3C Data Activity Lead & W3C champion for the Web of things

Received on Friday, 22 January 2021 10:50:35 UTC