Re: [EXTERNAL] Natural language interfaces and conversational agents

> White, Jason J <mailto:jjwhite@ets.org>
> Thursday 4 March 2021 14:22
>
> Thank you, John, for categorizing the requirements.
>
> We’re starting to make progress. I can move this material to a wiki 
> page and edit it, if this would be the appropriate next step.
>

I think that would be good Jason, and thanks to you both for your 
excellent input. I'm starting to think framing this work as as 'Natural 
Language Interface Accessibility User Requirements' may be a viable way 
to go, and it is really interesting to see you both in unison around 
some great Cognitive User Needs etc.

Thanks

Josh

> *From:* John Paton <John.Paton@rnib.org.uk>
> *Sent:* Thursday, 4 March 2021 5:05
> *To:* White, Jason J <jjwhite@ets.org>; public-rqtf@w3.org
> *Subject:* RE: [EXTERNAL] Natural language interfaces and 
> conversational agents
>
> Hi Jason,
>
> I think either Shadi or Michael suggested that the scope could be 
> written out with separate sub domains so I’ve had a go at doing that 
> with your points below. Once populated it may then help carve up the 
> work into manageable chunks. By setting out the overall scope of the 
> work (and then announcing the segment of the scope the group intends 
> to work on first) that could also set the conceptual anchor that Judy 
> suggested whilst not committing us to working on everything at once. 
> Working on a sub-segment may be a challenge to get the crisp scope 
> that Michael petitioned for but I think it should be possible.
>
> It also highlights that a large number of the points raised so far are 
> cognitive which would suggest that we need input from COGA.
>
> One point from the meeting was that voice agents are already a thing 
> in the wild that need accessibility guidance. I understand that they 
> will fail the general accessibility requirements of the W3C but if we 
> refuse to work on the specific accessibility requirements of voice 
> interaction then we are not helping advance the accessibility of those 
> devices. Whether it is the first area we work on or whether we later 
> drill down I think it’s important that at some point we address 
> accessibility issues relating to a solely speech based interaction. 
> Otherwise I won’t be doing my job as an advocate of blind and 
> partially sighted people. Braille is used by a minority of people with 
> sight loss and so it is a nonsense to say that a purely speech based 
> interaction is inaccessible. For some of the people I represent it is 
> the accessibility ideal. I guess there is a difference between 
> specific accessibility (does it work for a person with particular 
> needs) and general accessibility (can it be used by everyone).
>
> I’ve mainly just clumped Jason’s words together so a sense check to 
> ensure they are in the right place and still make sense in a new 
> context would be welcome.
>
>
>     Smart Agents
>
>   * Sensory issues: the need to support multiple output modalities for
>     the natural language interface (visual, auditory,
>     braille/tactile), either directly or via assistive technologies.
>     Whether a generic text input/output interface in the style of an
>     IRC client or instant messaging application would suffice to
>     satisfy these requirements, given the availability of assistive
>     technologies..
>   * User identification and authentication: how the system can
>     ascertain who is interacting with it. Speaker identification may
>     be feasible if speech input is used. The authentication features
>     of the underlying platform/operating system (e.g., biometrics
>     other than voice) would presumably need to be supported as well,
>     so that there are multiple mechanisms of authentication available.
>     If the system is accessed via a Web page, then presumably the
>     standard Web-based authentication mechanisms can be used; but
>     there are more issues for stand-alone hardware devices or mobile
>     applications in providing accessible authentication methods.
>
>
>       Voice interaction
>
>   * For spoken output of the natural language interaction: what the
>     user should be able to control (e.g., speech rate, volume, choice
>     of voices, etc.).
>   * For speech input: accurate recognition of speakers who have
>     different speech characteristics (e.g., due to having a
>     disability). How the system should respond when low confidence in
>     the speech recognition is detected (e.g., by prompting for
>     information to be repeated or asking the user for confirmation).
>
>
>       Text Interaction
>
>   * For visual output of the natural language interaction: what the
>     user should be able to control (e.g., font size, text spacing, and
>     other style properties of displayed text).
>   * Input issues: support for multiple input modes (keyboard, switch,
>     eye tracking, speech, etc.), either directly or via assistive
>     technologies. Whether an IRC/instant messaging-style interaction
>     is sufficient to satisfy these requirements, given the
>     availability of assistive technologies
>   * For text input: perhaps some error-handling issues (e.g., spelling
>     errors) should be discussed). What else should be addressed here?
>   * Where grammar and word order follow common but non-standard
>     patterns (such as English text in a British Sign Language word
>     order) should these be options?
>   * Cognitive: For visual output, scroll-back support so that the user
>     can review the entire conversation/interaction would seem useful.
>     Even if speech input and output are used, a textually displayed
>     log of the conversation could still be beneficial (e.g., presented
>     on screen or via a braille device). The log should clearly
>     distinguish the user’s input from the system’s output.
>
>
>       Graphical interfaces and other
>
>   * For graphical output generated by the system that is not part of
>     the natural language interaction (e.g., maps, interactive Web
>     pages, etc., displayed by the application in response to the
>     user’s request) – we should probably refer to existing guidelines
>     and indicate that only the natural language interaction itself is
>     within scope here. This seems on first analysis to be a reasonable
>     scope boundary. Also, if the natural language interface is part of
>     a telephony application or similar service, perhaps RAUR could be
>     referred to as well.
>   * Whether sign language input or AAC symbol input can be supported,
>     given the current state of technology (possibly different answers
>     depending on the circumstances).
>   * For multimodal systems that support digital pen input or other
>     forms of graphical input (e.g., for working with diagrams or for
>     handwriting recognition), support for recognizing input provided
>     by people with motor-related disabilities would be important, and
>     this doesn’t seem to be addressed elsewhere in W3C guidance. Some
>     systems, for example, offer a combination of speech input and pen
>     input. On the other hand, we could argue that since the pen input
>     isn’t strictly part of the natural language processing, it’s out
>     of scope for purposes of the present project.
>   * Whether AAC symbols or sign language could be used for output –
>     possibly infeasible in the short term due to the unsolved research
>     problems involved, at least for sign languages
>
>
>       Cognitive issues
>
>   * Cognitive: issues of discoverability – how the user knows what
>     sentences/utterances the system will accept at any point during
>     the interaction. Availability of help information. Inclusion of
>     hints/prompts/suggestions in the system’s output to assist the
>     user in knowing what can be done next. The use of menus of options
>     to guide the user’s decisions during an interactive session.
>   * Cognitive: reminding the user of the context and of previously
>     provided information. We need more analysis of the requirements
>     here. The ability for the user to request that information be
>     repeated would also assist with memory-related issues, especially
>     if speech output is used and the interaction is not displayed
>     visually.
>   * Cognitive: access to glossary definitions and explanations at the
>     user’s request. The option for the user to request spelling of
>     names or other words if speech output is used would also be helpful.
>   * Cognitive: the option for the user to request reminders of
>     upcoming events relevant to the system’s operation (e.g., calendar
>     appointments). Reminders and alerts would need to be multimodal
>     (e.g., auditory, visual, vibratory/haptic) as well.
>   * Cognitive: support for configuring the system to provide simpler
>     language, perhaps an interface with fewer options/capabilities
>     which is restricted to only the features that the particular user
>     needs.
>   * Cognitive: support for a variety of vocabulary and a variety of
>     ways of issuing the same request or providing the same information
>     – that is, flexibility in handling a wide diversity of natural
>     language sentences/utterances that users may give as input to the
>     system. The ability to handle repeated information.
>   * Cognitive: the ability for the user to correct errors, and how
>     this should be supported – more work is obviously needed here.
>   * Cognitive: keeping track of the context of a conversation as a
>     dialogue with the user progresses and of previously supplied
>     information. This is a research problem in natural language
>     processing, and it isn’t clear what the accessibility requirements
>     should be here. Users are likely to expect to be able to depend on
>     or refer to aspects of the context as an interaction progresses,
>     and this may be especially important for those with learning or
>     cognitive disabilities.
>
>
>       Notes
>
> Relationship with hardware capabilities: natural language-based 
> interfaces can occur in a variety of contexts – as stand-alone 
> hardware devices such as “smart speakers” and consumer appliances, as 
> applications running on mobile phones and tablets, in wearable 
> computing devices, on desktop and laptop systems, as components of Web 
> pages/applications, via telephony/RTC-based applications, etc. The 
> same system may be available via multiple means (e.g., in dedicated 
> hardware or via the Web according to the user’s preference). Different 
> modalities and different accessibility features may be available 
> depending on the platform used to interact with the natural language 
> interface. Should we say that the accessibility requirements apply to 
> the software system/natural language interface, but that they will be 
> supported in different ways and to a different extent depending on the 
> platform?
>
> Note that many of the foregoing issues are modality-independent, and 
> that cognitive considerations have a large role. Further, restricting 
> the scope of the work to the natural language interaction itself – 
> citing other sources of guidance concerning the accessibility of other 
> aspects of the over-all system – seems reasonable in order to keep the 
> requirement gathering effort suitably confined.
>
> Thanks,
>
> John
>
> *From:* White, Jason J <jjwhite@ets.org>
> *Sent:* 03 March 2021 17:21
> *To:* public-rqtf@w3.org
> *Subject:* RE: [EXTERNAL] Natural language interfaces and 
> conversational agents
>
> *CAUTION: External. Do not click links or open attachments unless you 
> know the content is safe.*
>
>
> ------------------------------------------------------------------------
>
> This e-mail and any files transmitted with it may contain privileged 
> or confidential information. It is solely for use by the individual 
> for whom it is intended, even if addressed incorrectly. If you 
> received this e-mail in error, please notify the sender; do not 
> disclose, copy, distribute, or take any action in reliance on the 
> contents of this information; and delete it from your system. Any 
> other use of this e-mail is prohibited.
>
>
> Thank you for your compliance.
>
> ------------------------------------------------------------------------
> John Paton <mailto:John.Paton@rnib.org.uk>
> Thursday 4 March 2021 10:05
>
> Hi Jason,
>
> I think either Shadi or Michael suggested that the scope could be 
> written out with separate sub domains so I’ve had a go at doing that 
> with your points below. Once populated it may then help carve up the 
> work into manageable chunks. By setting out the overall scope of the 
> work (and then announcing the segment of the scope the group intends 
> to work on first) that could also set the conceptual anchor that Judy 
> suggested whilst not committing us to working on everything at once. 
> Working on a sub-segment may be a challenge to get the crisp scope 
> that Michael petitioned for but I think it should be possible.
>
> It also highlights that a large number of the points raised so far are 
> cognitive which would suggest that we need input from COGA.
>
> One point from the meeting was that voice agents are already a thing 
> in the wild that need accessibility guidance. I understand that they 
> will fail the general accessibility requirements of the W3C but if we 
> refuse to work on the specific accessibility requirements of voice 
> interaction then we are not helping advance the accessibility of those 
> devices. Whether it is the first area we work on or whether we later 
> drill down I think it’s important that at some point we address 
> accessibility issues relating to a solely speech based interaction. 
> Otherwise I won’t be doing my job as an advocate of blind and 
> partially sighted people. Braille is used by a minority of people with 
> sight loss and so it is a nonsense to say that a purely speech based 
> interaction is inaccessible. For some of the people I represent it is 
> the accessibility ideal. I guess there is a difference between 
> specific accessibility (does it work for a person with particular 
> needs) and general accessibility (can it be used by everyone).
>
> I’ve mainly just clumped Jason’s words together so a sense check to 
> ensure they are in the right place and still make sense in a new 
> context would be welcome.
>
>
>     Smart Agents
>
>   * Sensory issues: the need to support multiple output modalities for
>     the natural language interface (visual, auditory,
>     braille/tactile), either directly or via assistive technologies.
>     Whether a generic text input/output interface in the style of an
>     IRC client or instant messaging application would suffice to
>     satisfy these requirements, given the availability of assistive
>     technologies..
>   * User identification and authentication: how the system can
>     ascertain who is interacting with it. Speaker identification may
>     be feasible if speech input is used. The authentication features
>     of the underlying platform/operating system (e.g., biometrics
>     other than voice) would presumably need to be supported as well,
>     so that there are multiple mechanisms of authentication available.
>     If the system is accessed via a Web page, then presumably the
>     standard Web-based authentication mechanisms can be used; but
>     there are more issues for stand-alone hardware devices or mobile
>     applications in providing accessible authentication methods.
>
>
>       Voice interaction
>
>   * For spoken output of the natural language interaction: what the
>     user should be able to control (e.g., speech rate, volume, choice
>     of voices, etc.).
>   * For speech input: accurate recognition of speakers who have
>     different speech characteristics (e.g., due to having a
>     disability). How the system should respond when low confidence in
>     the speech recognition is detected (e.g., by prompting for
>     information to be repeated or asking the user for confirmation).
>
>
>       Text Interaction
>
>   * For visual output of the natural language interaction: what the
>     user should be able to control (e.g., font size, text spacing, and
>     other style properties of displayed text).
>   * Input issues: support for multiple input modes (keyboard, switch,
>     eye tracking, speech, etc.), either directly or via assistive
>     technologies. Whether an IRC/instant messaging-style interaction
>     is sufficient to satisfy these requirements, given the
>     availability of assistive technologies
>   * For text input: perhaps some error-handling issues (e.g., spelling
>     errors) should be discussed). What else should be addressed here?
>   * Where grammar and word order follow common but non-standard
>     patterns (such as English text in a British Sign Language word
>     order) should these be options?
>   * Cognitive: For visual output, scroll-back support so that the user
>     can review the entire conversation/interaction would seem useful.
>     Even if speech input and output are used, a textually displayed
>     log of the conversation could still be beneficial (e.g., presented
>     on screen or via a braille device). The log should clearly
>     distinguish the user’s input from the system’s output.
>
>
>       Graphical interfaces and other
>
>   * For graphical output generated by the system that is not part of
>     the natural language interaction (e.g., maps, interactive Web
>     pages, etc., displayed by the application in response to the
>     user’s request) – we should probably refer to existing guidelines
>     and indicate that only the natural language interaction itself is
>     within scope here. This seems on first analysis to be a reasonable
>     scope boundary. Also, if the natural language interface is part of
>     a telephony application or similar service, perhaps RAUR could be
>     referred to as well.
>   * Whether sign language input or AAC symbol input can be supported,
>     given the current state of technology (possibly different answers
>     depending on the circumstances).
>   * For multimodal systems that support digital pen input or other
>     forms of graphical input (e.g., for working with diagrams or for
>     handwriting recognition), support for recognizing input provided
>     by people with motor-related disabilities would be important, and
>     this doesn’t seem to be addressed elsewhere in W3C guidance. Some
>     systems, for example, offer a combination of speech input and pen
>     input. On the other hand, we could argue that since the pen input
>     isn’t strictly part of the natural language processing, it’s out
>     of scope for purposes of the present project.
>   * Whether AAC symbols or sign language could be used for output –
>     possibly infeasible in the short term due to the unsolved research
>     problems involved, at least for sign languages
>
>
>       Cognitive issues
>
>   * Cognitive: issues of discoverability – how the user knows what
>     sentences/utterances the system will accept at any point during
>     the interaction. Availability of help information. Inclusion of
>     hints/prompts/suggestions in the system’s output to assist the
>     user in knowing what can be done next. The use of menus of options
>     to guide the user’s decisions during an interactive session.
>   * Cognitive: reminding the user of the context and of previously
>     provided information. We need more analysis of the requirements
>     here. The ability for the user to request that information be
>     repeated would also assist with memory-related issues, especially
>     if speech output is used and the interaction is not displayed
>     visually.
>   * Cognitive: access to glossary definitions and explanations at the
>     user’s request. The option for the user to request spelling of
>     names or other words if speech output is used would also be helpful.
>   * Cognitive: the option for the user to request reminders of
>     upcoming events relevant to the system’s operation (e.g., calendar
>     appointments). Reminders and alerts would need to be multimodal
>     (e.g., auditory, visual, vibratory/haptic) as well.
>   * Cognitive: support for configuring the system to provide simpler
>     language, perhaps an interface with fewer options/capabilities
>     which is restricted to only the features that the particular user
>     needs.
>   * Cognitive: support for a variety of vocabulary and a variety of
>     ways of issuing the same request or providing the same information
>     – that is, flexibility in handling a wide diversity of natural
>     language sentences/utterances that users may give as input to the
>     system. The ability to handle repeated information.
>   * Cognitive: the ability for the user to correct errors, and how
>     this should be supported – more work is obviously needed here.
>   * Cognitive: keeping track of the context of a conversation as a
>     dialogue with the user progresses and of previously supplied
>     information. This is a research problem in natural language
>     processing, and it isn’t clear what the accessibility requirements
>     should be here. Users are likely to expect to be able to depend on
>     or refer to aspects of the context as an interaction progresses,
>     and this may be especially important for those with learning or
>     cognitive disabilities.
>
>
>       Notes
>
> Relationship with hardware capabilities: natural language-based 
> interfaces can occur in a variety of contexts – as stand-alone 
> hardware devices such as “smart speakers” and consumer appliances, as 
> applications running on mobile phones and tablets, in wearable 
> computing devices, on desktop and laptop systems, as components of Web 
> pages/applications, via telephony/RTC-based applications, etc. The 
> same system may be available via multiple means (e.g., in dedicated 
> hardware or via the Web according to the user’s preference). Different 
> modalities and different accessibility features may be available 
> depending on the platform used to interact with the natural language 
> interface. Should we say that the accessibility requirements apply to 
> the software system/natural language interface, but that they will be 
> supported in different ways and to a different extent depending on the 
> platform?
>
> Note that many of the foregoing issues are modality-independent, and 
> that cognitive considerations have a large role. Further, restricting 
> the scope of the work to the natural language interaction itself – 
> citing other sources of guidance concerning the accessibility of other 
> aspects of the over-all system – seems reasonable in order to keep the 
> requirement gathering effort suitably confined.
>
> Thanks,
>
> John
>
> *From:*White, Jason J <jjwhite@ets.org>
> *Sent:* 03 March 2021 17:21
> *To:* public-rqtf@w3.org
> *Subject:* RE: [EXTERNAL] Natural language interfaces and 
> conversational agents
>
> *CAUTION: External. Do not click links or open attachments unless you 
> know the content is safe.*
>
> -- 
>
> RNIB Take on 250 Logo 
> <https://www.rnib.org.uk/donations-and-fundraising/challenge-events/take-250-rnib>
>
>
> Every day, 250 people in the UK begin to lose their sight, that’s why 
> we need you to Take on 250 for RNIB 
> <%20http://www.rnib.org.uk/take-on-250>. Walking, running, cycling or 
> swimming; baking, singing, dancing or knitting. It’s all up for grabs 
> – and you complete 250 of whatever you decide.
> Join Us <%20http://www.rnib.org.uk/take-on-250> and make a difference 
> for people facing sight loss.
>
> -- 
>
> DISCLAIMER:
>
> The information contained in this email and any attachments is 
> confidential and may be privileged. If you are not the intended 
> recipient you should not use, disclose, distribute or copy any of the 
> content of it or of any attachment; you are requested to notify the 
> sender immediately of your receipt of the email and then to delete it 
> and any attachments from your system.
>
> RNIB endeavours to ensure that all emails and attachments are virus 
> free. We cannot, however, guarantee nor accept any responsibility for 
> the integrity of unsecure email.
>
> We therefore recommend that you use up to date anti-virus software and 
> scan all communications.
>
> Please note that the statements and views expressed in this email and 
> any attachments are those of the author and do not necessarily 
> represent those of RNIB.
>
> RNIB Registered Charity Number: 226227
>
> Website: https://www.rnib.org.uk
>
> White, Jason J <mailto:jjwhite@ets.org>
> Wednesday 3 March 2021 17:21
>
> At the meeting today, it was agreed we should attempt a preliminary 
> classification of the issues that should be addressed within the scope 
> of this topic. Based on the conversations that have taken place so 
> far, and after reflecting on the matter, here is my first approximation.
>
> Sensory issues: the need to support multiple output modalities for the 
> natural language interface (visual, auditory, braille/tactile), either 
> directly or via assistive technologies. Whether a generic text 
> input/output interface in the style of an IRC client or instant 
> messaging application would suffice to satisfy these requirements, 
> given the availability of assistive technologies. Whether AAC symbols 
> or sign language could be used for output – possibly infeasible in the 
> short term due to the unsolved research problems involved, at least 
> for sign languages.
>
> For visual output of the natural language interaction: what the user 
> should be able to control (e.g., font size, text spacing, and other 
> style properties of displayed text).
>
> For spoken output of the natural language interaction: what the user 
> should be able to control (e.g., speech rate, volume, choice of 
> voices, etc.).
>
> For graphical output generated by the system that is not part of the 
> natural language interaction (e.g., maps, interactive Web pages, etc., 
> displayed by the application in response to the user’s request) – we 
> should probably refer to existing guidelines and indicate that only 
> the natural language interaction itself is within scope here. This 
> seems on first analysis to be a reasonable scope boundary. Also, if 
> the natural language interface is part of a telephony application or 
> similar service, perhaps RAUR could be referred to as well.
>
> Input issues: support for multiple input modes (keyboard, switch, eye 
> tracking, speech, etc.), either directly or via assistive 
> technologies. Whether an IRC/instant messaging-style interaction is 
> sufficient to satisfy these requirements, given the availability of 
> assistive technologies. Whether sign language input or AAC symbol 
> input can be supported, given the current state of technology 
> (possibly different answers depending on the circumstances).
>
> For speech input: accurate recognition of speakers who have different 
> speech characteristics (e.g., due to having a disability). How the 
> system should respond when low confidence in the speech recognition is 
> detected (e.g., by prompting for information to be repeated or asking 
> the user for confirmation).
>
> For multimodal systems that support digital pen input or other forms 
> of graphical input (e.g., for working with diagrams or for handwriting 
> recognition), support for recognizing input provided by people with 
> motor-related disabilities would be important, and this doesn’t seem 
> to be addressed elsewhere in W3C guidance. Some systems, for example, 
> offer a combination of speech input and pen input. On the other hand, 
> we could argue that since the pen input isn’t strictly part of the 
> natural language processing, it’s out of scope for purposes of the 
> present project.
>
> For text input: perhaps some error-handling issues (e.g., spelling 
> errors) should be discussed). What else should be addressed here?
>
> Cognitive: issues of discoverability – how the user knows what 
> sentences/utterances the system will accept at any point during the 
> interaction. Availability of help information. Inclusion of 
> hints/prompts/suggestions in the system’s output to assist the user in 
> knowing what can be done next. The use of menus of options to guide 
> the user’s decisions during an interactive session.
>
> Cognitive: reminding the user of the context and of previously 
> provided information. We need more analysis of the requirements here. 
> The ability for the user to request that information be repeated would 
> also assist with memory-related issues, especially if speech output is 
> used and the interaction is not displayed visually. For visual output, 
> scroll-back support so that the user can review the entire 
> conversation/interaction would seem useful. Even if speech input and 
> output are used, a textually displayed log of the conversation could 
> still be beneficial (e.g., presented on screen or via a braille 
> device). The log should clearly distinguish the user’s input from the 
> system’s output.
>
> Cognitive: access to glossary definitions and explanations at the 
> user’s request. The option for the user to request spelling of names 
> or other words if speech output is used would also be helpful.
>
> Cognitive: the option for the user to request reminders of upcoming 
> events relevant to the system’s operation (e.g., calendar 
> appointments). Reminders and alerts would need to be multimodal (e.g., 
> auditory, visual, vibratory/haptic) as well.
>
> Cognitive: support for configuring the system to provide simpler 
> language, perhaps an interface with fewer options/capabilities which 
> is restricted to only the features that the particular user needs.
>
> Cognitive: support for a variety of vocabulary and a variety of ways 
> of issuing the same request or providing the same information – that 
> is, flexibility in handling a wide diversity of natural language 
> sentences/utterances that users may give as input to the system. The 
> ability to handle repeated information.
>
> Cognitive: the ability for the user to correct errors, and how this 
> should be supported – more work is obviously needed here.
>
> Cognitive: keeping track of the context of a conversation as a 
> dialogue with the user progresses and of previously supplied 
> information. This is a research problem in natural language 
> processing, and it isn’t clear what the accessibility requirements 
> should be here. Users are likely to expect to be able to depend on or 
> refer to aspects of the context as an interaction progresses, and this 
> may be especially important for those with learning or cognitive 
> disabilities.
>
> User identification and authentication: how the system can ascertain 
> who is interacting with it. Speaker identification may be feasible if 
> speech input is used. The authentication features of the underlying 
> platform/operating system (e.g., biometrics other than voice) would 
> presumably need to be supported as well, so that there are multiple 
> mechanisms of authentication available. If the system is accessed via 
> a Web page, then presumably the standard Web-based authentication 
> mechanisms can be used; but there are more issues for stand-alone 
> hardware devices or mobile applications in providing accessible 
> authentication methods.
>
> Relationship with hardware capabilities: natural language-based 
> interfaces can occur in a variety of contexts – as stand-alone 
> hardware devices such as “smart speakers” and consumer appliances, as 
> applications running on mobile phones and tablets, in wearable 
> computing devices, on desktop and laptop systems, as components of Web 
> pages/applications, via telephony/RTC-based applications, etc. The 
> same system may be available via multiple means (e.g., in dedicated 
> hardware or via the Web according to the user’s preference). Different 
> modalities and different accessibility features may be available 
> depending on the platform used to interact with the natural language 
> interface. Should we say that the accessibility requirements apply to 
> the software system/natural language interface, but that they will be 
> supported in different ways and to a different extent depending on the 
> platform?
>
> Note that many of the foregoing issues are modality-independent, and 
> that cognitive considerations have a large role. Further, restricting 
> the scope of the work to the natural language interaction itself – 
> citing other sources of guidance concerning the accessibility of other 
> aspects of the over-all system – seems reasonable in order to keep the 
> requirement gathering effort suitably confined.
>
> What issues have I missed?
>
> How reasonable is the scope?
>
> Corrections, refinements, and objections are all welcome.
>
> Regards,
>
> Jason.
>
>
> ------------------------------------------------------------------------
>
> This e-mail and any files transmitted with it may contain privileged 
> or confidential information. It is solely for use by the individual 
> for whom it is intended, even if addressed incorrectly. If you 
> received this e-mail in error, please notify the sender; do not 
> disclose, copy, distribute, or take any action in reliance on the 
> contents of this information; and delete it from your system. Any 
> other use of this e-mail is prohibited.
>
>
> Thank you for your compliance.
>
> ------------------------------------------------------------------------
> White, Jason J <mailto:jjwhite@ets.org>
> Wednesday 24 February 2021 18:26
>
> Thank you, John, for your thoughtful comments. It seems to me that IVR 
> would qualify – especially if speech or TTY/real-time text is 
> supported as input. VoiceXML was, as I understand it, designed with 
> these applications in mind.
>
> I think there’s an interesting question, along the lines you raise, 
> about the accessibility requirement. For example, would the 
> requirement be for all devices implementing a natural language 
> interface to support text input/output, or only for the software to 
> support it (where the user might need to choose appropriate hardware, 
> such as a mobile phone or tablet, to gain access to this 
> functionality)? What happens if the natural language interface is in 
> your microwave oven or robotic vacuum cleaner? Is it acceptable that 
> you might have to control it remotely via another device in such 
> cases? Is the ability to do this an accessibility requirement, if the 
> oven or vacuum cleaner can’t connect to a keyboard-like input device 
> directly? It probably is, but those seem to me to be some of the 
> issues involved that ought to be considered during requirement gathering.
>
> Comments are most welcome.
>
> Regards,
>
> Jason.
>
> *From:* John Paton <John.Paton@rnib.org.uk>
> *Sent:* Wednesday, 24 February 2021 13:17
> *To:* White, Jason J <jjwhite@ets.org>; public-rqtf@w3.org
> *Subject:* RE: [EXTERNAL] Natural language interfaces and 
> conversational agents
>
> Thanks Jason,
>
> Would we count IVR 
> <https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FInteractive_voice_response&data=04%7C01%7Cjjwhite%40ets.org%7Cf700c17613d444dafaa408d8d8f07074%7C0ba6e9b760b34fae92f37e6ddd9e9b65%7C0%7C0%7C637497874659565213%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=S2%2BpVqp9D5Vda0Q9zvYth9kr0OqwR4Xzuhjyc8eKWt0%3D&reserved=0> 
> as a relevant example? It’s more multiple choice then conversational 
> from my experience but that’s not necessarily the case. You could also 
> argue the voice control in products such as Smart TVs is a 
> conversational interface distinct from the general purpose ‘smart 
> assistants’ since it is a secondary interaction mechanism rather than 
> the primary one (you would likely struggle to control your TV solely 
> via voice commands).
>
> Reading the sentence “Thus it is a basic accessibility requirement 
> that these interfaces support multiple modes of input and output.” 
> does concern me as it sounds like multimodality is a Must. If we are 
> saying that all devices Must support both speech and text for both 
> input and output then I think almost all of them will fail. Maybe the 
> computer based smart assistants such as Siri/Cortana cover all of 
> these but I’m not sure if they accept text input. I agree the work of 
> looking at natural language UIs needs to cover both voice and text but 
> if we require every UI instance to support both then we may be 
> limiting the scope to a class of device that rarely occurs in the 
> wild. A blind user may use a voice agent whereas a deaf user would use 
> a text or GUI based device. Both can be accessible to their respective 
> markets (and both may still have accessibility considerations such as 
> timeouts, cognitive considerations and gracefully handling 
> accents/spelling errors). They would likely benefit from multimodal 
> inputs and outputs but I would argue that not every instance needs to 
> support every modality to be deemed to have some accessibility. Hope 
> that doesn’t undo the progress we’ve made on the topic.
>
> That does lead to the question of how often a text-based natural 
> language UI is seen as preferable to a GUI? Is it only in a small 
> selection of cases (ie where the possible range of inputs is too wide 
> to offer a multiple choice selection)?
>
> Thanks for pulling the text below together. It helps a lot to see it 
> in writing I think.
>
> Best regards,
>
> John
>
> *From:* White, Jason J <jjwhite@ets.org>
> *Sent:* 24 February 2021 16:26
> *To:* public-rqtf@w3.org
> *Subject:* [EXTERNAL] Natural language interfaces and conversational 
> agents
>
> *CAUTION: External. Do not click links or open attachments unless you 
> know the content is safe.*
>
>
> ------------------------------------------------------------------------
>
> This e-mail and any files transmitted with it may contain privileged 
> or confidential information. It is solely for use by the individual 
> for whom it is intended, even if addressed incorrectly. If you 
> received this e-mail in error, please notify the sender; do not 
> disclose, copy, distribute, or take any action in reliance on the 
> contents of this information; and delete it from your system. Any 
> other use of this e-mail is prohibited.
>
>
> Thank you for your compliance.
>
> ------------------------------------------------------------------------
> John Paton <mailto:John.Paton@rnib.org.uk>
> Wednesday 24 February 2021 18:17
>
> Thanks Jason,
>
> Would we count IVR 
> <https://en.wikipedia.org/wiki/Interactive_voice_response> as a 
> relevant example? It’s more multiple choice then conversational from 
> my experience but that’s not necessarily the case. You could also 
> argue the voice control in products such as Smart TVs is a 
> conversational interface distinct from the general purpose ‘smart 
> assistants’ since it is a secondary interaction mechanism rather than 
> the primary one (you would likely struggle to control your TV solely 
> via voice commands).
>
> Reading the sentence “Thus it is a basic accessibility requirement 
> that these interfaces support multiple modes of input and output.” 
> does concern me as it sounds like multimodality is a Must. If we are 
> saying that all devices Must support both speech and text for both 
> input and output then I think almost all of them will fail. Maybe the 
> computer based smart assistants such as Siri/Cortana cover all of 
> these but I’m not sure if they accept text input. I agree the work of 
> looking at natural language UIs needs to cover both voice and text but 
> if we require every UI instance to support both then we may be 
> limiting the scope to a class of device that rarely occurs in the 
> wild. A blind user may use a voice agent whereas a deaf user would use 
> a text or GUI based device. Both can be accessible to their respective 
> markets (and both may still have accessibility considerations such as 
> timeouts, cognitive considerations and gracefully handling 
> accents/spelling errors). They would likely benefit from multimodal 
> inputs and outputs but I would argue that not every instance needs to 
> support every modality to be deemed to have some accessibility. Hope 
> that doesn’t undo the progress we’ve made on the topic.
>
> That does lead to the question of how often a text-based natural 
> language UI is seen as preferable to a GUI? Is it only in a small 
> selection of cases (ie where the possible range of inputs is too wide 
> to offer a multiple choice selection)?
>
> Thanks for pulling the text below together. It helps a lot to see it 
> in writing I think.
>
> Best regards,
>
> John
>
> *From:*White, Jason J <jjwhite@ets.org>
> *Sent:* 24 February 2021 16:26
> *To:* public-rqtf@w3.org
> *Subject:* [EXTERNAL] Natural language interfaces and conversational 
> agents
>
> *CAUTION: External. Do not click links or open attachments unless you 
> know the content is safe.*
>
> -- 
>
> RNIB Take on 250 Logo 
> <https://www.rnib.org.uk/donations-and-fundraising/challenge-events/take-250-rnib>
>
>
> Every day, 250 people in the UK begin to lose their sight, that’s why 
> we need you to Take on 250 for RNIB 
> <%20http://www.rnib.org.uk/take-on-250>. Walking, running, cycling or 
> swimming; baking, singing, dancing or knitting. It’s all up for grabs 
> – and you complete 250 of whatever you decide.
> Join Us <%20http://www.rnib.org.uk/take-on-250> and make a difference 
> for people facing sight loss.
>
> -- 
>
> DISCLAIMER:
>
> The information contained in this email and any attachments is 
> confidential and may be privileged. If you are not the intended 
> recipient you should not use, disclose, distribute or copy any of the 
> content of it or of any attachment; you are requested to notify the 
> sender immediately of your receipt of the email and then to delete it 
> and any attachments from your system.
>
> RNIB endeavours to ensure that all emails and attachments are virus 
> free. We cannot, however, guarantee nor accept any responsibility for 
> the integrity of unsecure email.
>
> We therefore recommend that you use up to date anti-virus software and 
> scan all communications.
>
> Please note that the statements and views expressed in this email and 
> any attachments are those of the author and do not necessarily 
> represent those of RNIB.
>
> RNIB Registered Charity Number: 226227
>
> Website: https://www.rnib.org.uk
>


-- 
Emerging Web Technology Specialist/Accessibility (WAI/W3C)

Received on Thursday, 4 March 2021 14:59:11 UTC