RE: [EXTERNAL] Natural language interfaces and conversational agents

By way of update prior to the meeting tomorrow:
As noted in mailing list discussions, I have had time available to document requirements and issues. I now have an experimental and very incomplete draft (written in HTML using ReSpec) tentatively entitled Natural Language Interface Accessibility User Requirements. Alternative titles are welcome. It covers the following.
Abstract and introduction (largely plundered and adapted from RAUR).
User identification and authentication.
Means of input and output.
Communicating in a language that the user needs.
Speech recognition and speech production.
Visually displayed text.

I haven’t had time to write out the cognitive accessibility requirements that we’ve discussed on the list. The scope question and some other issues are noted, based on mailing list discussions. There are numerous other issues and limitations as well. I’ve noted Janina’s issue concerning what should be required if the natural language interface provides functionality that is also available via a different kind of interface, but without attempting to solve it. The examples that John and Scott have been discussing recently on the list aren’t fully covered yet.
If the Task Force decides to continue exploring the kind of approach discussed on the list during the past week, we can sort out the logistics of uploading what I’ve written so far as a starting point for collaboration.
Thanks are also owed to Josh for verifying that the approach I’ve taken is broadly in keeping with what he thinks could be viable (while we all recognize that the scope issues are still very much open).
None of this is meant to foreclose or discourage alternative approaches, of course.
Let’s discuss it further at the meeting tomorrow.

From: Scott Hollier <scott@hollier.info>
Sent: Saturday, 6 March 2021 0:50
To: White, Jason J <jjwhite@ets.org>; John Paton <John.Paton@rnib.org.uk>; public-rqtf@w3.org
Subject: RE: [EXTERNAL] Natural language interfaces and conversational agents

To Jason

Makes sense to me – logical approach to avoid scope creep and connect other work.

Scott.


[Scott Hollier logo]Dr Scott Hollier
Digital Access Specialist
Mobile: +61 (0)430 351 909
Web: www.hollier.info<https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.hollier.info%2F&data=04%7C01%7Cjjwhite%40ets.org%7Ca645fe854481462b087208d8e0639f6f%7C0ba6e9b760b34fae92f37e6ddd9e9b65%7C0%7C0%7C637506065781074950%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=r2fazRz4F1mV3zM8BHBxFXwPLeYvIqqS2oOOX%2BzqtIA%3D&reserved=0>

Technology for everyone

Keep up with digital access news by following @scotthollier on Twitter<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fscotthollier&data=04%7C01%7Cjjwhite%40ets.org%7Ca645fe854481462b087208d8e0639f6f%7C0ba6e9b760b34fae92f37e6ddd9e9b65%7C0%7C0%7C637506065781074950%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2uID4GCLgvRFfr8nNCz2l7IdkIb1P1aClyLdn%2BBuwio%3D&reserved=0>.

From: White, Jason J <jjwhite@ets.org>
Sent: Friday, 5 March 2021 9:45 PM
To: John Paton <John.Paton@rnib.org.uk>; Scott Hollier <scott@hollier.info>; public-rqtf@w3.org
Subject: RE: [EXTERNAL] Natural language interfaces and conversational agents

Thank you, John and Scott, for adding important requirements to this discussion. I spent time yesterday writing out material based loosely on RAUR which is meant to be illustrative of what pursuing Josh’s idea of a Natural Language Interface Accessibility User Requirements draft might look like. It’s still a very rough and highly preliminary attempt to document the scope of work and some of the requirements.
I still think scope is going to be an issue, but not necessarily an insurmountable one. The points you raise test that boundary. The approach I’ve been exploring (and this is just a working idea at this point) is as follows.

  *   Focus our work on the accessibility of the natural language interaction itself. As far as I know, no one has documented the accessibility requirements for it elsewhere.
  *   Refer to other guidance (WCAG, XAUR, RAUR, etc.) for the accessibility of other aspects of the user interface.
  *   Note that natural language interaction can occur as part of a larger interface and that the whole interface needs to be accessible.
After reading your comments below, I would be interested in reactions to the following further idea.
Giving the user the option of performing the entire task via the natural language interface alone (without relying on some other interface for part of the task) would seem to be an appropriate principle to document. Thus, the natural language interface could process a query and then present the information without requiring the user to turn to some other aspect of the interface to read the information found. This need not be the default, but having it as an option seems reasonable. Thoughts and counter-proposals are of course welcome. Are there other principles in play here?

From: John Paton <John.Paton@rnib.org.uk<mailto:John.Paton@rnib.org.uk>>
Sent: Friday, 5 March 2021 5:14
To: Scott Hollier <scott@hollier.info<mailto:scott@hollier.info>>; White, Jason J <jjwhite@ets.org<mailto:jjwhite@ets.org>>; public-rqtf@w3.org<mailto:public-rqtf@w3.org>
Subject: RE: [EXTERNAL] Natural language interfaces and conversational agents

Thanks Scott,

By haptics do you just mean vibrations or do you have other mechanisms in mind? There has been some work on creating mid-air haptics using acoustic radiation pressure. It creates a sensation in mid air that you can feel. When I tried it it was more of a buzzing sensation when you moved your hand through the affected air than a solid object but I don’t know if that’s a limitation of the technology or just the particular implementation I got to play with. I think someone attempted a similar thing using a tablet screen as the medium but my understanding was that it could really only simulate scales, fur, wet surface etc. Cool and clever but with limited communication potential. Morse code via vibration has been mooted as an output and apparently radio hams often have a Morse reading speed of around 20 wpm (quick google stat so don’t quote me).

There have been some attempts at “pin art” style displays which raise pins similar to a braille display but in a grid. I think that’s in the cutting edge phase at the moment with the main barriers being reliability and cost. The underlying technology uses electromagnetism to push up the pins and creating sufficiently strong fields to move a metal pin while not interfering with the pin next door is an engineering problem that’s traditionally kept prices of braille displays very high.

Your example of the smart clock and nest is a good example of some of my concerns. The smart clock has speech in and out when answering a question which makes it accessible to a blind user. If the nest hub is communicating information to users just on the screen though then that part of the communication is suddenly inaccessible for a blind user. Designers love speech recognition but they tend to default to a screen when presenting information to the user. That’s why speech only interaction is a good way to force lazy app developers to make their apps accessible.

For a device capable of multi-channel inputs and outputs a blind or deaf user would benefit from “essential communication modalities” where input or output  has to be carried by voice or text respectively. For some people a softer “default communication modality” may work better since they may prefer to use speech or hearing unless they can’t hear/be understood. They could be combined if only the user can trigger a deviation from the default but I presume these are details for later discussion.

Cheers,

John

From: Scott Hollier <scott@hollier.info<mailto:scott@hollier.info>>
Sent: 05 March 2021 07:58
To: White, Jason J <jjwhite@ets.org<mailto:jjwhite@ets.org>>; public-rqtf@w3.org<mailto:public-rqtf@w3.org>
Subject: RE: [EXTERNAL] Natural language interfaces and conversational agents

CAUTION: External. Do not click links or open attachments unless you know the content is safe.
________________________________
Hi everyone

Firstly, apologies if my input to Wednesday’s meeting wasn’t as coherent as it could be. The 10pm-11pm meeting time took its toll a bit this week after lots of other late nights Nd early starts. In two weeks the meetings return to 9pm-10pm for me which will be great and hopefully I can contribute a bit more on the calls.

To throw in my two cents not on the discussion.


  *   Jason: fantastic summary and great input  from everyone
  *   Not sure if it’s covered by tactile generally, but I think broader haptics beyond Braille are important. Providing voice commands that have that can create 3D/VR output and verbal  responses when interacting with the haptic output would be a potential use case. Aside from VR, it would be a logical step for a digital assistant to be able to project haptic interaction of a 3D image based on user requests. I’m thinking a device the size of a Nintendo 3DS where the digital assistant provides interaction and the 3D part provided haptic interaction guided by voice commands.
  *   The current Amazon Echo Show has some features that allow a deaf person   to interact with the device, by memory it displays the verbal command on the screen and has a series of vibrations and screen flashing to indicate responses to verbal commands,  so that may be a partial implementation to look at
  *   To Jason’s point about switching input is a really good point. For example, I have two Google devices: one is a smart clock, the other is a Nest Hub. When you ask the smart clock ‘where is the nearest McDonalds’ for example, it says ‘I’ve found few places and then reads them out to you On the Nest hub though it doesn’t read them out, it shows you on the screen instead. The clocks is especially tricky as most of the time it gives the answer verbally due to its limited screen functions, but occasionally it’ll just do it on the screen as it can display some things. To my knowledge there’s no straightforward way to set up a preference so guidance on this would be really important IMHO.

Thanks everyone

Scot


  *
[Scott Hollier logo]Dr Scott Hollier
Digital Access Specialist
Mobile: +61 (0)430 351 909
Web: www.hollier.info<https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.hollier.info%2F&data=04%7C01%7Cjjwhite%40ets.org%7Ca645fe854481462b087208d8e0639f6f%7C0ba6e9b760b34fae92f37e6ddd9e9b65%7C0%7C0%7C637506065781084898%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=AbfxG0RD56lqu4sq47lbzx1gfxt8p%2Bg4hEtrAHDHQII%3D&reserved=0>

Technology for everyone

Keep up with digital access news by following @scotthollier on Twitter<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2Fscotthollier&data=04%7C01%7Cjjwhite%40ets.org%7Ca645fe854481462b087208d8e0639f6f%7C0ba6e9b760b34fae92f37e6ddd9e9b65%7C0%7C0%7C637506065781084898%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=7QH1eYdYN%2Ft1PwVUQBEBDkTPbrP4e10EaePZH6%2FAahk%3D&reserved=0>.

From: White, Jason J <jjwhite@ets.org<mailto:jjwhite@ets.org>>
Sent: Thursday, 4 March 2021 4:37 AM
To: public-rqtf@w3.org<mailto:public-rqtf@w3.org>
Subject: RE: [EXTERNAL] Natural language interfaces and conversational agents

As an addendum to my analysis from earlier today: the following question has arisen in other work to which I’ve contributed in the past, but on which I don’t have a well informed answer.
Assuming that a natural language interface supports both speech input/output and text input/output, how important is it for the user to be able to switch between these modes during an interaction, rather than deciding on one or the other at the outset of the interaction and not having the option to alter this choice until the interactive session has ended? For example, suppose the user can either interact with the system textually via a Web page in a manner similar to an instant messaging system, or activate a button that starts a WebRTC voice session, but cannot switch from one to the other until the a new interactive session is started. To what extent would this be an accessibility limitation?

From: White, Jason J <jjwhite@ets.org<mailto:jjwhite@ets.org>>
Sent: Wednesday, 3 March 2021 12:21
To: public-rqtf@w3.org<mailto:public-rqtf@w3.org>
Subject: RE: [EXTERNAL] Natural language interfaces and conversational agents

At the meeting today, it was agreed we should attempt a preliminary classification of the issues that should be addressed within the scope of this topic. Based on the conversations that have taken place so far, and after reflecting on the matter, here is my first approximation.

Sensory issues: the need to support multiple output modalities for the natural language interface (visual, auditory, braille/tactile), either directly or via assistive technologies. Whether a generic text input/output interface in the style of an IRC client or instant messaging application would suffice to satisfy these requirements, given the availability of assistive technologies. Whether AAC symbols or sign language could be used for output – possibly infeasible in the short term due to the unsolved research problems involved, at least for sign languages.

For visual output of the natural language interaction: what the user should be able to control (e.g., font size, text spacing, and other style properties of displayed text).

For spoken output of the natural language interaction: what the user should be able to control (e.g., speech rate, volume, choice of voices, etc.).

For graphical output generated by the system that is not part of the natural language interaction (e.g., maps, interactive Web pages, etc., displayed by the application in response to the user’s request) – we should probably refer to existing guidelines and indicate that only the natural language interaction itself is within scope here. This seems on first analysis to be a reasonable scope boundary. Also, if the natural language interface is part of a telephony application or similar service, perhaps RAUR could be referred to as well.

Input issues: support for multiple input modes (keyboard, switch, eye tracking, speech, etc.), either directly or via assistive technologies. Whether an IRC/instant messaging-style interaction is sufficient to satisfy these requirements, given the availability of assistive technologies. Whether sign language input or AAC symbol input can be supported, given the current state of technology (possibly different answers depending on the circumstances).

For speech input: accurate recognition of speakers who have different speech characteristics (e.g., due to having a disability). How the system should respond when low confidence in the speech recognition is detected (e.g., by prompting for information to be repeated or asking the user for confirmation).

For multimodal systems that support digital pen input or other forms of graphical input (e.g., for working with diagrams or for handwriting recognition), support for recognizing input provided by people with motor-related disabilities would be important, and this doesn’t seem to be addressed elsewhere in W3C guidance. Some systems, for example, offer a combination of speech input and pen input. On the other hand, we could argue that since the pen input isn’t strictly part of the natural language processing, it’s out of scope for purposes of the present project.

For text input: perhaps some error-handling issues (e.g., spelling errors) should be discussed). What else should be addressed here?

Cognitive: issues of discoverability – how the user knows what sentences/utterances the system will accept at any point during the interaction. Availability of help information. Inclusion of hints/prompts/suggestions in the system’s output to assist the user in knowing what can be done next. The use of menus of options to guide the user’s decisions during an interactive session.

Cognitive: reminding the user of the context and of previously provided information. We need more analysis of the requirements here. The ability for the user to request that information be repeated would also assist with memory-related issues, especially if speech output is used and the interaction is not displayed visually. For visual output, scroll-back support so that the user can review the entire conversation/interaction would seem useful. Even if speech input and output are used, a textually displayed log of the conversation could still be beneficial (e.g., presented on screen or via a braille device). The log should clearly distinguish the user’s input from the system’s output.

Cognitive: access to glossary definitions and explanations at the user’s request. The option for the user to request spelling of names or other words if speech output is used would also be helpful.

Cognitive: the option for the user to request reminders of upcoming events relevant to the system’s operation (e.g., calendar appointments). Reminders and alerts would need to be multimodal (e.g., auditory, visual, vibratory/haptic) as well.

Cognitive: support for configuring the system to provide simpler language, perhaps an interface with fewer options/capabilities which is restricted to only the features that the particular user needs.

Cognitive: support for a variety of vocabulary and a variety of ways of issuing the same request or providing the same information – that is, flexibility in handling a wide diversity of natural language sentences/utterances that users may give as input to the system. The ability to handle repeated information.

Cognitive: the ability for the user to correct errors, and how this should be supported – more work is obviously needed here.

Cognitive: keeping track of the context of a conversation as a dialogue with the user progresses and of previously supplied information. This is a research problem in natural language processing, and it isn’t clear what the accessibility requirements should be here. Users are likely to expect to be able to depend on or refer to aspects of the context as an interaction progresses, and this may be especially important for those with learning or cognitive disabilities.

User identification and authentication: how the system can ascertain who is interacting with it. Speaker identification may be feasible if speech input is used. The authentication features of the underlying platform/operating system (e.g., biometrics other than voice) would presumably need to be supported as well, so that there are multiple mechanisms of authentication available. If the system is accessed via a Web page, then presumably the standard Web-based authentication mechanisms can be used; but there are more issues for stand-alone hardware devices or mobile applications in providing accessible authentication methods.

Relationship with hardware capabilities: natural language-based interfaces can occur in a variety of contexts – as stand-alone hardware devices such as “smart speakers” and consumer appliances, as applications running on mobile phones and tablets, in wearable computing devices, on desktop and laptop systems, as components of Web pages/applications, via telephony/RTC-based applications, etc. The same system may be available via multiple means (e.g., in dedicated hardware or via the Web according to the user’s preference). Different modalities and different accessibility features may be available depending on the platform used to interact with the natural language interface. Should we say that the accessibility requirements apply to the software system/natural language interface, but that they will be supported in different ways and to a different extent depending on the platform?

Note that many of the foregoing issues are modality-independent, and that cognitive considerations have a large role. Further, restricting the scope of the work to the natural language interaction itself – citing other sources of guidance concerning the accessibility of other aspects of the over-all system – seems reasonable in order to keep the requirement gathering effort suitably confined.

What issues have I missed?
How reasonable is the scope?
Corrections, refinements, and objections are all welcome.

Regards,

Jason.


________________________________

This e-mail and any files transmitted with it may contain privileged or confidential information. It is solely for use by the individual for whom it is intended, even if addressed incorrectly. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.


Thank you for your compliance.

________________________________

________________________________

This e-mail and any files transmitted with it may contain privileged or confidential information. It is solely for use by the individual for whom it is intended, even if addressed incorrectly. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.


Thank you for your compliance.

________________________________

--
[Image removed by sender. RNIB Take on 250 Logo]<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rnib.org.uk%2Fdonations-and-fundraising%2Fchallenge-events%2Ftake-250-rnib&data=04%7C01%7Cjjwhite%40ets.org%7Ca645fe854481462b087208d8e0639f6f%7C0ba6e9b760b34fae92f37e6ddd9e9b65%7C0%7C0%7C637506065781094858%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Rs7dQxARsfsjICGP%2BhvZB0%2FS6dc5zDs3hEmKYF25fYE%3D&reserved=0>

Every day, 250 people in the UK begin to lose their sight, that’s why we need you to Take on 250 for RNIB<https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.rnib.org.uk%2Ftake-on-250&data=04%7C01%7Cjjwhite%40ets.org%7Ca645fe854481462b087208d8e0639f6f%7C0ba6e9b760b34fae92f37e6ddd9e9b65%7C0%7C0%7C637506065781094858%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jmAMzRgyISr1xHZNkU3f1fysVmE685XVpAyiSe3z3oU%3D&reserved=0>. Walking, running, cycling or swimming; baking, singing, dancing or knitting. It’s all up for grabs – and you complete 250 of whatever you decide.
Join Us<https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.rnib.org.uk%2Ftake-on-250&data=04%7C01%7Cjjwhite%40ets.org%7Ca645fe854481462b087208d8e0639f6f%7C0ba6e9b760b34fae92f37e6ddd9e9b65%7C0%7C0%7C637506065781104813%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2ZKxSdsqoW%2BO6vTr7CgavlajNena6yhxZgyVgrjhQoA%3D&reserved=0> and make a difference for people facing sight loss.

--

DISCLAIMER:

The information contained in this email and any attachments is confidential and may be privileged. If you are not the intended recipient you should not use, disclose, distribute or copy any of the content of it or of any attachment; you are requested to notify the sender immediately of your receipt of the email and then to delete it and any attachments from your system.

RNIB endeavours to ensure that all emails and attachments are virus free. We cannot, however, guarantee nor accept any responsibility for the integrity of unsecure email.

We therefore recommend that you use up to date anti-virus software and scan all communications.

Please note that the statements and views expressed in this email and any attachments are those of the author and do not necessarily represent those of RNIB.

RNIB Registered Charity Number: 226227

Website: https://www.rnib.org.uk<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rnib.org.uk%2F&data=04%7C01%7Cjjwhite%40ets.org%7Ca645fe854481462b087208d8e0639f6f%7C0ba6e9b760b34fae92f37e6ddd9e9b65%7C0%7C0%7C637506065781104813%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=JAgsN%2BT7xOSwDuSSrvFtUQhUEVdB63J8Qt1IAqHuuBc%3D&reserved=0>

________________________________

This e-mail and any files transmitted with it may contain privileged or confidential information. It is solely for use by the individual for whom it is intended, even if addressed incorrectly. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.


Thank you for your compliance.

________________________________

________________________________

This e-mail and any files transmitted with it may contain privileged or confidential information. It is solely for use by the individual for whom it is intended, even if addressed incorrectly. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.


Thank you for your compliance.

________________________________

Received on Tuesday, 9 March 2021 17:38:49 UTC