- From: John Paton <John.Paton@rnib.org.uk>
- Date: Wed, 24 Feb 2021 18:17:19 +0000
- To: "White, Jason J" <jjwhite@ets.org>, "public-rqtf@w3.org" <public-rqtf@w3.org>
- Message-ID: <LO2P265MB05110B8C60A5ABB2966A9D9CC09F9@LO2P265MB0511.GBRP265.PROD.OUTLOOK.COM>
Thanks Jason, Would we count IVR<https://en.wikipedia.org/wiki/Interactive_voice_response> as a relevant example? It’s more multiple choice then conversational from my experience but that’s not necessarily the case. You could also argue the voice control in products such as Smart TVs is a conversational interface distinct from the general purpose ‘smart assistants’ since it is a secondary interaction mechanism rather than the primary one (you would likely struggle to control your TV solely via voice commands). Reading the sentence “Thus it is a basic accessibility requirement that these interfaces support multiple modes of input and output.” does concern me as it sounds like multimodality is a Must. If we are saying that all devices Must support both speech and text for both input and output then I think almost all of them will fail. Maybe the computer based smart assistants such as Siri/Cortana cover all of these but I’m not sure if they accept text input. I agree the work of looking at natural language UIs needs to cover both voice and text but if we require every UI instance to support both then we may be limiting the scope to a class of device that rarely occurs in the wild. A blind user may use a voice agent whereas a deaf user would use a text or GUI based device. Both can be accessible to their respective markets (and both may still have accessibility considerations such as timeouts, cognitive considerations and gracefully handling accents/spelling errors). They would likely benefit from multimodal inputs and outputs but I would argue that not every instance needs to support every modality to be deemed to have some accessibility. Hope that doesn’t undo the progress we’ve made on the topic. That does lead to the question of how often a text-based natural language UI is seen as preferable to a GUI? Is it only in a small selection of cases (ie where the possible range of inputs is too wide to offer a multiple choice selection)? Thanks for pulling the text below together. It helps a lot to see it in writing I think. Best regards, John From: White, Jason J <jjwhite@ets.org> Sent: 24 February 2021 16:26 To: public-rqtf@w3.org Subject: [EXTERNAL] Natural language interfaces and conversational agents CAUTION: External. Do not click links or open attachments unless you know the content is safe. ________________________________ My purpose in writing is to summarize central ideas discussed by the Task Force in characterizing the scope of this potential area of work. Natural language interfaces are the topic of the proposed requirement analysis. A natural language interface is characterized by receiving input and generating output in a natural language. The input and the output may be provided in any of several modalities, including text (e.g., entered via a keyboard or displayed visually), or speech (e.g., using speech recognition for input and text to speech for output). A natural language interface may be combined with other types of interface in a single application. For example, a system may generate graphical output or display a Web page in response to natural language input. However, the scope of the proposed work is the natural language aspect of the system; other aspects of the over-all interface are addressed by standards and guidance provided elsewhere. By way of illustration, if a natural language interface were offered in an immersive environment, then accessibility requirements related to natural language interaction and requirements related to XR would both be relevant to the design of the system as a whole. Examples of natural language interfaces include: * An automated chat application embedded in a Web page, in which the user communicates with a software agent rather than with another person. Such an application could be used, for instance, by an organization to process basic customer service inquiries. * A general-purpose conversational agents that offers a range of services to the user – answering a variety of questions, playing multimedia content, home automation, etc. The agent may be available as part of a desktop or mobile platform, or may be implemented in a stand-alone device such as a “smart speaker” or a home appliance. * An educational application that uses natural language interaction to evaluate or to improve a student’s competence in a particular skill or field of study. For instance, such an application could be used as an aid to second language acquisition. * A classic “text adventure” game in which natural language is used to solve problems and make choices in an interactive story. * A service robot in a building that can answer a limited range of questions and respond to users’ commands in natural language. * Are there other examples that should be added here? Clearly, a natural language interface that offers only speech input and speech output is fundamentally inaccessible to those which hearing or speech-related disabilities. Thus it is a basic accessibility requirement that these interfaces support multiple modes of input and output. There are, of course, other accessibility requirements that ought to be identified and documented. For example, there are * Sensory requirements – not only the ability for the user to choose among multiple means of input and output, but also within each mode, such as support for adjusting speech rate and volume, or the style properties of displayed text. * Cognitive requirements, for example to facilitate the discovery of features of the interface – what can the system do? Reminders and other memory aids, the use of AAC symbols for communication, etc. * Physical requirements, such as for entirely touch-free interaction with the system (particularly applicable if the natural language interface is offered in specialized hardware such as a vehicle or a home appliance). Some unresolved research problems that we have identified include * Sign language interaction. * Brain-computer interface interaction. With this as a starting point, comments and refinements are most welcome. ________________________________ This e-mail and any files transmitted with it may contain privileged or confidential information. It is solely for use by the individual for whom it is intended, even if addressed incorrectly. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited. Thank you for your compliance. ________________________________ -- Every day, 250 people in the UK begin to lose their sight, that’s why we need you to Take on 250 for RNIB. Walking, running, cycling or swimming; baking, singing, dancing or knitting. It’s all up for grabs – and you complete 250 of whatever you decide. Join us and make a difference for people facing sight loss. Join us at https://www.rnib.org.uk/donations-and-fundraising/challenge-events/take-250-rnib and make a difference for people facing sight loss. -- DISCLAIMER: NOTICE: The information contained in this email and any attachments is confidential and may be privileged. If you are not the intended recipient you should not use, disclose, distribute or copy any of the content of it or of any attachment; you are requested to notify the sender immediately of your receipt of the email and then to delete it and any attachments from your system. RNIB endeavours to ensure that emails and any attachments generated by its staff are free from viruses or other contaminants. However, it cannot accept any responsibility for any such which are transmitted. We therefore recommend you scan all attachments. Please note that the statements and views expressed in this email and any attachments are those of the author and do not necessarily represent those of RNIB. RNIB Registered Charity Number: 226227 Website: https://www.rnib.org.uk
Received on Wednesday, 24 February 2021 18:17:40 UTC