Natural language interfaces and conversational agents

My purpose in writing is to summarize central ideas discussed by the Task Force in characterizing the scope of this potential area of work.
Natural language interfaces are the topic of the proposed requirement analysis. A natural language interface is characterized by receiving input and generating output in a natural language. The input and the output may be provided in any of several modalities, including text (e.g., entered via a keyboard or displayed visually), or speech (e.g., using speech recognition for input and text to speech for output).
A natural language interface may be combined with other types of interface in a single application. For example, a system may generate graphical output or display a Web page in response to natural language input. However, the scope of the proposed work is the natural language aspect of the system; other aspects of the over-all interface are addressed by standards and guidance provided elsewhere. By way of illustration, if a natural language interface were offered in an immersive environment, then accessibility requirements related to natural language interaction and requirements related to XR would both be relevant to the design of the system as a whole.
Examples of natural language interfaces include:

  *   An automated chat application embedded in a Web page, in which the user communicates with a software agent rather than with another person. Such an application could be used, for instance, by an organization to process basic customer service inquiries.
  *   A general-purpose conversational agents that offers a range of services to the user – answering a variety of questions, playing multimedia content, home automation, etc. The agent may be available as part of a desktop or mobile platform, or may be implemented in a stand-alone device such as a “smart speaker” or a home appliance.
  *   An educational application that uses natural language interaction to evaluate or to improve a student’s competence in a particular skill or field of study. For instance, such an application could be used as an aid to second language acquisition.
  *   A classic “text adventure” game in which natural language is used to solve problems and make choices in an interactive story.
  *   A service robot in a building that can answer a limited range of questions and respond to users’ commands in natural language.
  *   Are there other examples that should be added here?
Clearly, a natural language interface that offers only speech input and speech output is fundamentally inaccessible to those which hearing or speech-related disabilities. Thus it is a basic accessibility requirement that these interfaces support multiple modes of input and output. There are, of course, other accessibility requirements that ought to be identified and documented. For example, there are

  *   Sensory requirements – not only the ability for the user to choose among multiple means of input and output, but also within each mode, such as support for adjusting speech rate and volume, or the style properties of displayed text.
  *   Cognitive requirements, for example to facilitate the discovery of features of the interface – what can the system do? Reminders and other memory aids, the use of AAC symbols for communication, etc.
  *   Physical requirements, such as for entirely touch-free interaction with the system (particularly applicable if the natural language interface is offered in specialized hardware such as a vehicle or a home appliance).
Some unresolved research problems that we have identified include

  *   Sign language interaction.
  *   Brain-computer interface interaction.
With this as a starting point, comments and refinements are most welcome.


Received on Wednesday, 24 February 2021 16:25:54 UTC