RE: [EXTERNAL] Natural language interfaces and conversational agents from White, Jason J on 2021-03-04 (public-rqtf@w3.org from March 2021)

From: White, Jason J <jjwhite@ets.org>
Date: Thu, 4 Mar 2021 14:18:27 +0000
To: John Paton <John.Paton@rnib.org.uk>, "public-rqtf@w3.org" <public-rqtf@w3.org>
Message-ID: <MN2PR07MB718184FC45078E313BDFC87CAB979@MN2PR07MB7181.namprd07.prod.outlook.com>
Thank you, John, for enumerating the various cases. In the context of communicating with a natural language interface, which of them should be recommended? Should they all ideally be supported? For example, I think there are circumstances in which keyboard input combined with speech output would be desirable for a person with a speech-related disability who can see visually displayed text output – where not having to look at a screen is important. However, they may not be common scenarios.
It seems to me that there are also cases in which the speech input and text output combination would be superior to keyboard input and text output, even if the user can operate a keyboard. (Touch-free interaction can be very convenient, and speaking the input may be faster than using a keyboard or similar device).
For the case of simultaneous speech and text output, we should also consider synchronized highlighting of the text as it is spoken – frequently used by people with learning disabilities. Of course, if the interaction takes place via a Web page as a text-based exchange, the speech (with synchronized highlighting, if necessary) can be provided by available assistive technologies. For stand-alone devices with visual displays and audio capabilities, this output option would need to be implemented by the system’s developers.

From: John Paton <John.Paton@rnib.org.uk>
Sent: Thursday, 4 March 2021 4:23
To: White, Jason J <jjwhite@ets.org>; public-rqtf@w3.org
Subject: RE: [EXTERNAL] Natural language interfaces and conversational agents

From a hearing loss perspective there are several modalities in telephony that are common (and I’ll try to get them right).

HCO – Hearing Carry Over
For someone who can hear well enough but is speech impaired.
The user types text but receives spoken audio.

VCO – Voice Carry Over
For someone who can’t hear well enough to follow a conversation but prefers speaking to typing.
User speaks but receives replies in text.

Captioned Telephony
Multimodal communication where the user receives a live captions of the speech. Technical issues mean mistakes in the captions are common and there is always a lag.
User speaks but receives both speech and text in return.

For symmetry I can imagine a captioned telephony where the user types but receives speech and text in return. I haven’t seen it in the wild.

If a system only supported text -> text OR voice -> voice then HCO and VCO would need the user to switch but I’m not aware of a use case where the user preferences would change part the way through a call.

In a human-machine interaction though if a user preferred to use their voice but the speech recognition didn’t understand them they may switch to text for that part or the rest of the call. They may also want the ability to edit (in text) the phrase that the computer understood from their speech. So there are use cases for multimodal input too.

John

From: White, Jason J <jjwhite@ets.org>
Sent: 03 March 2021 20:37
To: public-rqtf@w3.org
Subject: RE: [EXTERNAL] Natural language interfaces and conversational agents

CAUTION: External. Do not click links or open attachments unless you know the content is safe.
________________________________
As an addendum to my analysis from earlier today: the following question has arisen in other work to which I’ve contributed in the past, but on which I don’t have a well informed answer.
Assuming that a natural language interface supports both speech input/output and text input/output, how important is it for the user to be able to switch between these modes during an interaction, rather than deciding on one or the other at the outset of the interaction and not having the option to alter this choice until the interactive session has ended? For example, suppose the user can either interact with the system textually via a Web page in a manner similar to an instant messaging system, or activate a button that starts a WebRTC voice session, but cannot switch from one to the other until the a new interactive session is started. To what extent would this be an accessibility limitation?

From: White, Jason J <jjwhite@ets.org<mailto:jjwhite@ets.org>>
Sent: Wednesday, 3 March 2021 12:21
To: public-rqtf@w3.org<mailto:public-rqtf@w3.org>
Subject: RE: [EXTERNAL] Natural language interfaces and conversational agents

At the meeting today, it was agreed we should attempt a preliminary classification of the issues that should be addressed within the scope of this topic. Based on the conversations that have taken place so far, and after reflecting on the matter, here is my first approximation.

Sensory issues: the need to support multiple output modalities for the natural language interface (visual, auditory, braille/tactile), either directly or via assistive technologies. Whether a generic text input/output interface in the style of an IRC client or instant messaging application would suffice to satisfy these requirements, given the availability of assistive technologies. Whether AAC symbols or sign language could be used for output – possibly infeasible in the short term due to the unsolved research problems involved, at least for sign languages.

For visual output of the natural language interaction: what the user should be able to control (e.g., font size, text spacing, and other style properties of displayed text).

For spoken output of the natural language interaction: what the user should be able to control (e.g., speech rate, volume, choice of voices, etc.).

For graphical output generated by the system that is not part of the natural language interaction (e.g., maps, interactive Web pages, etc., displayed by the application in response to the user’s request) – we should probably refer to existing guidelines and indicate that only the natural language interaction itself is within scope here. This seems on first analysis to be a reasonable scope boundary. Also, if the natural language interface is part of a telephony application or similar service, perhaps RAUR could be referred to as well.

Input issues: support for multiple input modes (keyboard, switch, eye tracking, speech, etc.), either directly or via assistive technologies. Whether an IRC/instant messaging-style interaction is sufficient to satisfy these requirements, given the availability of assistive technologies. Whether sign language input or AAC symbol input can be supported, given the current state of technology (possibly different answers depending on the circumstances).

For speech input: accurate recognition of speakers who have different speech characteristics (e.g., due to having a disability). How the system should respond when low confidence in the speech recognition is detected (e.g., by prompting for information to be repeated or asking the user for confirmation).

For multimodal systems that support digital pen input or other forms of graphical input (e.g., for working with diagrams or for handwriting recognition), support for recognizing input provided by people with motor-related disabilities would be important, and this doesn’t seem to be addressed elsewhere in W3C guidance. Some systems, for example, offer a combination of speech input and pen input. On the other hand, we could argue that since the pen input isn’t strictly part of the natural language processing, it’s out of scope for purposes of the present project.

For text input: perhaps some error-handling issues (e.g., spelling errors) should be discussed). What else should be addressed here?

Cognitive: issues of discoverability – how the user knows what sentences/utterances the system will accept at any point during the interaction. Availability of help information. Inclusion of hints/prompts/suggestions in the system’s output to assist the user in knowing what can be done next. The use of menus of options to guide the user’s decisions during an interactive session.

Cognitive: reminding the user of the context and of previously provided information. We need more analysis of the requirements here. The ability for the user to request that information be repeated would also assist with memory-related issues, especially if speech output is used and the interaction is not displayed visually. For visual output, scroll-back support so that the user can review the entire conversation/interaction would seem useful. Even if speech input and output are used, a textually displayed log of the conversation could still be beneficial (e.g., presented on screen or via a braille device). The log should clearly distinguish the user’s input from the system’s output.

Cognitive: access to glossary definitions and explanations at the user’s request. The option for the user to request spelling of names or other words if speech output is used would also be helpful.

Cognitive: the option for the user to request reminders of upcoming events relevant to the system’s operation (e.g., calendar appointments). Reminders and alerts would need to be multimodal (e.g., auditory, visual, vibratory/haptic) as well.

Cognitive: support for configuring the system to provide simpler language, perhaps an interface with fewer options/capabilities which is restricted to only the features that the particular user needs.

Cognitive: support for a variety of vocabulary and a variety of ways of issuing the same request or providing the same information – that is, flexibility in handling a wide diversity of natural language sentences/utterances that users may give as input to the system. The ability to handle repeated information.

Cognitive: the ability for the user to correct errors, and how this should be supported – more work is obviously needed here.

Cognitive: keeping track of the context of a conversation as a dialogue with the user progresses and of previously supplied information. This is a research problem in natural language processing, and it isn’t clear what the accessibility requirements should be here. Users are likely to expect to be able to depend on or refer to aspects of the context as an interaction progresses, and this may be especially important for those with learning or cognitive disabilities.

User identification and authentication: how the system can ascertain who is interacting with it. Speaker identification may be feasible if speech input is used. The authentication features of the underlying platform/operating system (e.g., biometrics other than voice) would presumably need to be supported as well, so that there are multiple mechanisms of authentication available. If the system is accessed via a Web page, then presumably the standard Web-based authentication mechanisms can be used; but there are more issues for stand-alone hardware devices or mobile applications in providing accessible authentication methods.

Relationship with hardware capabilities: natural language-based interfaces can occur in a variety of contexts – as stand-alone hardware devices such as “smart speakers” and consumer appliances, as applications running on mobile phones and tablets, in wearable computing devices, on desktop and laptop systems, as components of Web pages/applications, via telephony/RTC-based applications, etc. The same system may be available via multiple means (e.g., in dedicated hardware or via the Web according to the user’s preference). Different modalities and different accessibility features may be available depending on the platform used to interact with the natural language interface. Should we say that the accessibility requirements apply to the software system/natural language interface, but that they will be supported in different ways and to a different extent depending on the platform?

Note that many of the foregoing issues are modality-independent, and that cognitive considerations have a large role. Further, restricting the scope of the work to the natural language interaction itself – citing other sources of guidance concerning the accessibility of other aspects of the over-all system – seems reasonable in order to keep the requirement gathering effort suitably confined.

What issues have I missed?
How reasonable is the scope?
Corrections, refinements, and objections are all welcome.

Regards,

Jason.


________________________________

This e-mail and any files transmitted with it may contain privileged or confidential information. It is solely for use by the individual for whom it is intended, even if addressed incorrectly. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.


Thank you for your compliance.

________________________________

________________________________

This e-mail and any files transmitted with it may contain privileged or confidential information. It is solely for use by the individual for whom it is intended, even if addressed incorrectly. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.


Thank you for your compliance.

________________________________

--
[Image removed by sender. RNIB Take on 250 Logo]<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rnib.org.uk%2Fdonations-and-fundraising%2Fchallenge-events%2Ftake-250-rnib&data=04%7C01%7Cjjwhite%40ets.org%7C72f2eaa241d3470a6d1c08d8deef162f%7C0ba6e9b760b34fae92f37e6ddd9e9b65%7C0%7C0%7C637504465754194996%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=zIaWEnaDLV7Qqy6IWOuPc6aVmd9MAwCtPDMNhLCFU8w%3D&reserved=0>

Every day, 250 people in the UK begin to lose their sight, that’s why we need you to Take on 250 for RNIB<https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.rnib.org.uk%2Ftake-on-250&data=04%7C01%7Cjjwhite%40ets.org%7C72f2eaa241d3470a6d1c08d8deef162f%7C0ba6e9b760b34fae92f37e6ddd9e9b65%7C0%7C0%7C637504465754204952%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=6Z%2BiUSvfa2JIdhO816TpADP3o%2Btf8vg8vfjQbhU2gu0%3D&reserved=0>. Walking, running, cycling or swimming; baking, singing, dancing or knitting. It’s all up for grabs – and you complete 250 of whatever you decide.
Join Us<https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.rnib.org.uk%2Ftake-on-250&data=04%7C01%7Cjjwhite%40ets.org%7C72f2eaa241d3470a6d1c08d8deef162f%7C0ba6e9b760b34fae92f37e6ddd9e9b65%7C0%7C0%7C637504465754204952%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=6Z%2BiUSvfa2JIdhO816TpADP3o%2Btf8vg8vfjQbhU2gu0%3D&reserved=0> and make a difference for people facing sight loss.

--

DISCLAIMER:

The information contained in this email and any attachments is confidential and may be privileged. If you are not the intended recipient you should not use, disclose, distribute or copy any of the content of it or of any attachment; you are requested to notify the sender immediately of your receipt of the email and then to delete it and any attachments from your system.

RNIB endeavours to ensure that all emails and attachments are virus free. We cannot, however, guarantee nor accept any responsibility for the integrity of unsecure email.

We therefore recommend that you use up to date anti-virus software and scan all communications.

Please note that the statements and views expressed in this email and any attachments are those of the author and do not necessarily represent those of RNIB.

RNIB Registered Charity Number: 226227

Website: https://www.rnib.org.uk<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rnib.org.uk%2F&data=04%7C01%7Cjjwhite%40ets.org%7C72f2eaa241d3470a6d1c08d8deef162f%7C0ba6e9b760b34fae92f37e6ddd9e9b65%7C0%7C0%7C637504465754214909%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=XBDnzsJsbD4WNXJBxa1LSqzrU%2BrFDu0QwLiH9QoyHcE%3D&reserved=0>

________________________________

This e-mail and any files transmitted with it may contain privileged or confidential information. It is solely for use by the individual for whom it is intended, even if addressed incorrectly. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.


Thank you for your compliance.

________________________________
Attachments

image/jpeg attachment: _WRD0003.jpg
Received on Thursday, 4 March 2021 14:18:46 UTC