RE: [EXTERNAL] Natural language interfaces and conversational agents from John Paton on 2021-02-25 (public-rqtf@w3.org from February 2021)

From: John Paton <John.Paton@rnib.org.uk>
Date: Thu, 25 Feb 2021 13:31:05 +0000
To: "White, Jason J" <jjwhite@ets.org>, "public-rqtf@w3.org" <public-rqtf@w3.org>
Message-ID: <LO2P265MB3547A15E7DC7BA52A2FF0C1CC09E9@LO2P265MB3547.GBRP265.PROD.OUTLOOK.COM>
Thanks Jason,

I agree it’s important to have these discussions in requirements gathering and apologies if I ever aim for persuasive but land on forceful. I’m always interested in other views and happy to try and adjust my tone if ever needed.

For a microwave or robot vacuum cleaner the natural language interface is generally voice control as an accessible alternative to the visual interface or display. In accessibility terms adding a text based natural language interface does not add anything that a GUI on a companion device would not. Even that only adds the advantage that you don’t have to be next to the device you are controlling. The way I see it the written natural language interface comes into it’s own in a few specific cases where:

  *   The scope of commands is too wide to offer a selection of options,
  *   Written language is a central part of the product’s use (ie the language learning application),
  *   Conversation is a central part of the product’s use (ie companionship products)
The first only needs to stay based on natural language for as long as the scope remains wide. Once it is narrowed down a menu of options can be presented which is often faster than typing. It is feasible that the scope could remain wide but not for the existing use cases that I can think of. Apart from the chatbot are there examples of textual natural language interfaces where written language or conversation are not central to the products use? Do we know of any research on the acceptability of textual natural language interfaces? I echo Janina’s sentiments on customer service chatbots but I think I have used them in the past as the fastest way to get particular information. Are we just biased by the poor implementations we’ve used in the past?

I would also question whether natural language input that could handle text and voice would work. As I understand it textual natural language interfaces are fairly freeform but voice agents usually require the user to word their command a particular way. I think this is to increase the accuracy of the speech recognition although it may just be to avoid having to do the natural language processing. If you typed your command to Alexa though you would need to stick to the formula for issuing that command and if you spoke your command to a web chatbot do we know if the accuracy would fail (you’re layering 2 AIs on top of each other so any errors are compounded). Live subtitling in the UK uses top end speech recognition trained on the voices of people who repeat all the dialogue in the program and yet mistakes are very common even under those ideal conditions. In effect you may have to have 2 natural language interface managers which wouldn’t converge until the system they are linked into. Either that or a robust (and likely annoying) error checking mechanism for the spoken input of asking “did you mean…?”.

I realised last night that I can fill in some of the gaps that Judy mentioned. I worked for 4 years in the deafness sector (although it was a decade ago so it’s still worth getting a contemporary account).

For ‘big D’ Deaf people in the UK (those who have BSL as a first language) learning written English is very hard. They don’t have the stepping stone of spoken English so it would be a bit like a sighted, hearing person speaking in English but having to learn to read and write in the Korean writing system. Some Deaf people read and write well in English and they are referred to as bilingual. Obviously Deaf people still need to use written English but it comes with a higher cognitive load. This is why Deaf users can find big blocks of text intimidating and it is best to use simple English if BSL is not an option. It would be interesting to see if a text based interface using a BSL word order would be helpful to Deaf users (or if an AI could understand both an English and BSL based word order and handle it gracefully). Again though, if short BSL clips or BSL diagrams are available they are likely to be more acceptable. The Deaf people I worked with hated textphones and saw them as a necessary evil and textphones are a sort of natural language interface with a real person on the other end. I’m not sure whether there is less cognitive load in countries where the written grammar has a more cohesive and less contradictory set of rules than English does. As far as I know almost all natural sign languages evolved separately from the spoken languages of the country so there is always that requirement to be bilingual. The exception is that I understand at least one Chinese sign language is based on one of the Chinese writing systems (presumable simplified Chinese since traditional may need more than 10 fingers). I don’t know if there is more than one Chinese sign language but it’s a very big place.

Hard of hearing and deafened people benefit from captioned speech and/or lipreading. The text and lip reading augment the information received through the speech so even bad captioning can help quite a bit. Lip reading alone doesn’t convey all the sounds (‘forty’ and ‘fourteen’ look the same on the lips) but I recently heard about a way of representing the missing sounds using handshapes. It didn’t used to be common and as far as I know it still isn’t but it’s there. So speech and captions or speech and video of the face can help. Speech with video of the face and the captions would be best.

So that’s a bit of a rambling brain-dump with a lot to unpick but the gist is that multichannel feedback (text and voice and maybe video) would benefit hard of hearing and deafened people. You may have trouble winning Deaf users around with a chatbot. I’m dubious about the acceptance (and utility) of textual natural language input except in specific circumstances and I think it would likely need a totally separate handler to the voice input. There’s also an interesting research question about whether text with a sign language word order would be acceptable to Deaf users and whether chatbots should be designed to allow a sign language word order (which could be a nightmarish localisation issue since countries with common spoken languages don’t share sign languages).

I hope that’s of use,

John

From: White, Jason J <jjwhite@ets.org>
Sent: 24 February 2021 18:27
To: John Paton <John.Paton@rnib.org.uk>; public-rqtf@w3.org
Subject: RE: [EXTERNAL] Natural language interfaces and conversational agents

CAUTION: External. Do not click links or open attachments unless you know the content is safe.
________________________________
Thank you, John, for your thoughtful comments. It seems to me that IVR would qualify – especially if speech or TTY/real-time text is supported as input. VoiceXML was, as I understand it, designed with these applications in mind.

I think there’s an interesting question, along the lines you raise, about the accessibility requirement. For example, would the requirement be for all devices implementing a natural language interface to support text input/output, or only for the software to support it (where the user might need to choose appropriate hardware, such as a mobile phone or tablet, to gain access to this functionality)? What happens if the natural language interface is in your microwave oven or robotic vacuum cleaner? Is it acceptable that you might have to control it remotely via another device in such cases? Is the ability to do this an accessibility requirement, if the oven or vacuum cleaner can’t connect to a keyboard-like input device directly? It probably is, but those seem to me to be some of the issues involved that ought to be considered during requirement gathering.

Comments are most welcome.

Regards,

Jason.

From: John Paton <John.Paton@rnib.org.uk<mailto:John.Paton@rnib.org.uk>>
Sent: Wednesday, 24 February 2021 13:17
To: White, Jason J <jjwhite@ets.org<mailto:jjwhite@ets.org>>; public-rqtf@w3.org<mailto:public-rqtf@w3.org>
Subject: RE: [EXTERNAL] Natural language interfaces and conversational agents

Thanks Jason,

Would we count IVR<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FInteractive_voice_response&data=04%7C01%7Cjjwhite%40ets.org%7Cf700c17613d444dafaa408d8d8f07074%7C0ba6e9b760b34fae92f37e6ddd9e9b65%7C0%7C0%7C637497874659565213%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=S2%2BpVqp9D5Vda0Q9zvYth9kr0OqwR4Xzuhjyc8eKWt0%3D&reserved=0> as a relevant example? It’s more multiple choice then conversational from my experience but that’s not necessarily the case. You could also argue the voice control in products such as Smart TVs is a conversational interface distinct from the general purpose ‘smart assistants’ since it is a secondary interaction mechanism rather than the primary one (you would likely struggle to control your TV solely via voice commands).

Reading the sentence “Thus it is a basic accessibility requirement that these interfaces support multiple modes of input and output.” does concern me as it sounds like multimodality is a Must. If we are saying that all devices Must support both speech and text for both input and output then I think almost all of them will fail. Maybe the computer based smart assistants such as Siri/Cortana cover all of these but I’m not sure if they accept text input. I agree the work of looking at natural language UIs needs to cover both voice and text but if we require every UI instance to support both then we may be limiting the scope to a class of device that rarely occurs in the wild. A blind user may use a voice agent whereas a deaf user would use a text or GUI based device. Both can be accessible to their respective markets (and both may still have accessibility considerations such as timeouts, cognitive considerations and gracefully handling accents/spelling errors). They would likely benefit from multimodal inputs and outputs but I would argue that not every instance needs to support every modality to be deemed to have some accessibility. Hope that doesn’t undo the progress we’ve made on the topic.

That does lead to the question of how often a text-based natural language UI is seen as preferable to a GUI? Is it only in a small selection of cases (ie where the possible range of inputs is too wide to offer a multiple choice selection)?

Thanks for pulling the text below together. It helps a lot to see it in writing I think.

Best regards,

John

From: White, Jason J <jjwhite@ets.org<mailto:jjwhite@ets.org>>
Sent: 24 February 2021 16:26
To: public-rqtf@w3.org<mailto:public-rqtf@w3.org>
Subject: [EXTERNAL] Natural language interfaces and conversational agents

CAUTION: External. Do not click links or open attachments unless you know the content is safe.
________________________________
My purpose in writing is to summarize central ideas discussed by the Task Force in characterizing the scope of this potential area of work.
Natural language interfaces are the topic of the proposed requirement analysis. A natural language interface is characterized by receiving input and generating output in a natural language. The input and the output may be provided in any of several modalities, including text (e.g., entered via a keyboard or displayed visually), or speech (e.g., using speech recognition for input and text to speech for output).
A natural language interface may be combined with other types of interface in a single application. For example, a system may generate graphical output or display a Web page in response to natural language input. However, the scope of the proposed work is the natural language aspect of the system; other aspects of the over-all interface are addressed by standards and guidance provided elsewhere. By way of illustration, if a natural language interface were offered in an immersive environment, then accessibility requirements related to natural language interaction and requirements related to XR would both be relevant to the design of the system as a whole.
Examples of natural language interfaces include:

  *   An automated chat application embedded in a Web page, in which the user communicates with a software agent rather than with another person. Such an application could be used, for instance, by an organization to process basic customer service inquiries.
  *   A general-purpose conversational agents that offers a range of services to the user – answering a variety of questions, playing multimedia content, home automation, etc. The agent may be available as part of a desktop or mobile platform, or may be implemented in a stand-alone device such as a “smart speaker” or a home appliance.
  *   An educational application that uses natural language interaction to evaluate or to improve a student’s competence in a particular skill or field of study. For instance, such an application could be used as an aid to second language acquisition.
  *   A classic “text adventure” game in which natural language is used to solve problems and make choices in an interactive story.
  *   A service robot in a building that can answer a limited range of questions and respond to users’ commands in natural language.
  *   Are there other examples that should be added here?
Clearly, a natural language interface that offers only speech input and speech output is fundamentally inaccessible to those which hearing or speech-related disabilities. Thus it is a basic accessibility requirement that these interfaces support multiple modes of input and output. There are, of course, other accessibility requirements that ought to be identified and documented. For example, there are

  *   Sensory requirements – not only the ability for the user to choose among multiple means of input and output, but also within each mode, such as support for adjusting speech rate and volume, or the style properties of displayed text.
  *   Cognitive requirements, for example to facilitate the discovery of features of the interface – what can the system do? Reminders and other memory aids, the use of AAC symbols for communication, etc.
  *   Physical requirements, such as for entirely touch-free interaction with the system (particularly applicable if the natural language interface is offered in specialized hardware such as a vehicle or a home appliance).
Some unresolved research problems that we have identified include

  *   Sign language interaction.
  *   Brain-computer interface interaction.
With this as a starting point, comments and refinements are most welcome.


________________________________

This e-mail and any files transmitted with it may contain privileged or confidential information. It is solely for use by the individual for whom it is intended, even if addressed incorrectly. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.


Thank you for your compliance.

________________________________

--
[Image removed by sender. RNIB Take on 250 Logo]<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rnib.org.uk%2Fdonations-and-fundraising%2Fchallenge-events%2Ftake-250-rnib&data=04%7C01%7Cjjwhite%40ets.org%7Cf700c17613d444dafaa408d8d8f07074%7C0ba6e9b760b34fae92f37e6ddd9e9b65%7C0%7C0%7C637497874659575164%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=RkSvFSfu0isDAKlLg%2FdoI3m0%2BO5DMWTaGWWnn3kq7TQ%3D&reserved=0>

Every day, 250 people in the UK begin to lose their sight, that’s why we need you to Take on 250 for RNIB<https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.rnib.org.uk%2Ftake-on-250&data=04%7C01%7Cjjwhite%40ets.org%7Cf700c17613d444dafaa408d8d8f07074%7C0ba6e9b760b34fae92f37e6ddd9e9b65%7C0%7C0%7C637497874659575164%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=taBgCGzA5IecoSxnetFlDfbT1ezoZKEPthpWjXcwYhc%3D&reserved=0>. Walking, running, cycling or swimming; baking, singing, dancing or knitting. It’s all up for grabs – and you complete 250 of whatever you decide.
Join Us<https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.rnib.org.uk%2Ftake-on-250&data=04%7C01%7Cjjwhite%40ets.org%7Cf700c17613d444dafaa408d8d8f07074%7C0ba6e9b760b34fae92f37e6ddd9e9b65%7C0%7C0%7C637497874659585120%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=40JEzvLDGSPYDxZ2HdIfF6KfSM5XSNC50a26nXbSLAI%3D&reserved=0> and make a difference for people facing sight loss.

--

DISCLAIMER:

The information contained in this email and any attachments is confidential and may be privileged. If you are not the intended recipient you should not use, disclose, distribute or copy any of the content of it or of any attachment; you are requested to notify the sender immediately of your receipt of the email and then to delete it and any attachments from your system.

RNIB endeavours to ensure that all emails and attachments are virus free. We cannot, however, guarantee nor accept any responsibility for the integrity of unsecure email.

We therefore recommend that you use up to date anti-virus software and scan all communications.

Please note that the statements and views expressed in this email and any attachments are those of the author and do not necessarily represent those of RNIB.

RNIB Registered Charity Number: 226227

Website: https://www.rnib.org.uk<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.rnib.org.uk%2F&data=04%7C01%7Cjjwhite%40ets.org%7Cf700c17613d444dafaa408d8d8f07074%7C0ba6e9b760b34fae92f37e6ddd9e9b65%7C0%7C0%7C637497874659585120%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=GbiMwdL1Mf0BdMRgDro%2FZfhTDZHV9u%2Bl7%2Fm7PL2bTxs%3D&reserved=0>

________________________________

This e-mail and any files transmitted with it may contain privileged or confidential information. It is solely for use by the individual for whom it is intended, even if addressed incorrectly. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.


Thank you for your compliance.

________________________________


--

Every day, 250 people in the UK begin to lose their sight, that’s why we need you to Take on 250 for RNIB. Walking, running, cycling or swimming; baking, singing, dancing or knitting. It’s all up for grabs – and you complete 250 of whatever you decide. Join us and make a difference for people facing sight loss.
Join us at https://www.rnib.org.uk/donations-and-fundraising/challenge-events/take-250-rnib and make a difference for people facing sight loss.

--


DISCLAIMER:

NOTICE: The information contained in this email and any attachments is confidential and may be privileged.  If you are not the intended recipient you should not use, disclose, distribute or copy any of the content of it or of any attachment; you are requested to notify the sender immediately of your receipt of the email and then to delete it and any attachments from your system.

RNIB endeavours to ensure that emails and any attachments generated by its staff are free from viruses or other contaminants.  However, it cannot accept any responsibility for any  such which are transmitted.

We therefore recommend you scan all attachments.

Please note that the statements and views expressed in this email and any attachments are those of the author and do not necessarily represent those of RNIB.

RNIB Registered Charity Number: 226227

Website: https://www.rnib.org.uk
Attachments

image/jpeg attachment: image001.jpg
Received on Thursday, 25 February 2021 13:31:29 UTC