- From: Deborah Dahl <dahl@conversational-technologies.com>
- Date: Wed, 8 Feb 2006 10:25:37 -0500
- To: <wai-liaison@w3.org>, <www-multimodal@w3.org>
The Multimodal Interaction Working Group would like to thank the Web Accessibilities Initiative Protocols and Formats Working Group for their thoughtful and perceptive comments [1] on the EMMA Last Call Working Draft [2]. We very much appreciate your taking the time to provide this feedback and we look forward to continuing to draw on your expertise to help EMMA (and more generally, Multimodal Interaction) play a role in improving the accessibility of the web. The EMMA subgroup has discussed your comments and has prepared the responses below. The MMIWG welcomes any further discussion on these comments. regards, Debbie Dahl, W3C MMIWG Chair [1] http://lists.w3.org/Archives/Public/www-multimodal/2005Dec/0000.html [2] http://www.w3.org/TR/emma/ RESPONSE TO EMMA FEEDBACK FROM W3C WAI GROUP =========================================================================== 1. We are concerned that in an approach that focuses on input and output modalities that are "widely used today" Assistive Technology devices might be left out in practice. Although theoretically it seems to be possible to apply EMMA to all types of input and output devices (modalities), including Assistive Technology, the important question is "Who is going to write the device-specific code for Assistive Technology devices?" If this is outside the scope of EMMA, please let us know who we should address with this question. RESPONSE: We share the concern of the WAI group as to whether the introduction of new protocols such as EMMA could adversely impact assistive technology, and the EMMA subgroup have discussed this in some detail in response to your feedback. EMMA is a markup for the representation and annotation of user inputs and is intended to enable support for modalities beyond keyboard and mouse such as speech and pen. As such EMMA can play an important role in enabling the representation of user inputs from assistive technology devices. The EMMA group would greatly welcome your feedback on classifications on different kinds of assistive devices that could be used as values of emma:mode. The broader issue concerns providing support for assistive technologies while the minimizing the burden on application developers building multimodal applications. We see three ways in which assistive devices may operate with multimodal applications: 1. The application developer building the interaction manager (IM) for the multimodal application builds it specifically with support for particular assistive devices. The IM might for example use different timeouts or break up the dialog differently depending on the kind of assistive device in use. In this case the assistive technology will produce EMMA representation of the user input, annotated to indicate the kind of device it is from, and IM will have specific dialog/interaction logic for that device. 2. The application developer does not directly provide support for the assistive devices but the developer of the assistive technology provides EMMA as a representation of the input on the assistive device. For example, for an application with speech input, the assistive technology would generate EMMA for the assistive device that looks like a sequence of words from speech recognition. 3. The third case is more like what we believe is prevalent today and likely (unfortunately) to remain the case for most devices where the assistive technology, generally at an operating system level, serves as an emulator of the keyboard and/or mouse. In this case, the only way to ensure that multimodal applications also support assistive devices is to establish best practices for multimodal application design. One principle would be that in any case where the interaction manager expects a verbal input, be it from speech or handwriting recognition it will also accept input from the keyboard. Another would be that if commands can be issued in one mode e.g. gui they can also be issue in the other e.g. speech (symmetry among the modes). Since EMMA does not provide an authoring language for interaction management or authoring of applications this lies outwith the scope of the EMMA specification itself. Within the MMI group this relates most closely to the multimodal architecture work and work on interaction management. The EMMA subgroup are starting to compile a list of best practices for authoring applications that consume EMMA but see this as better suited to a separate best practices Note rather than as part of the EMMA specification. 2. Adaptation to delivery context --------------------------------- 2.1 System and Environment Composite input should provide environmental information. Since input is used to define a response, the system response should take into account environmental conditions that should be captured at input time. Here are some examples: Signal to Noise Ratio (SNR) Lighting conditions Power changes (may throw out input or prompt user to re-enter information) In the case of a low SNR you might want to change the volume, pitch, or if the system provides it - captioning. Sustained SNR issues may result in noise cancellation to improve voice recognition. This should be included with EMMA structural elements. Some of these issues could be reflected in confidence but the confidence factor provides no information as to why the confidence level is low and how to adapt the system. RESPONSE: System and environment issues were initially addressed within the MMI working group and includes the kinds of information described above along with other factors such as the location of the device. That work is now called DCI (Delivery Context Interfaces) and is has moved to the Device Independence working group: http://www.w3.org/TR/2005/WD-DPF-20051111/ In the Multimodal architecture work within the MMI group, DCI (previously DPF) is accessed directly from the interaction manager, rather than through the annotation of EMMA inputs. http://www.w3.org/TR/mmi-arch/ We believe it is important for system and environment information to be accessed directly through DCI from the IM because the interaction should be able to adjust whether the user provides an input or not (EMMA is only going to arrive to the IM when the user makes an input). For example, if the interaction manager will adapt and use visual prompts rather than audio when the SNR is beneath a threshold. This adaption should occur regardless of whether the user has produced a spoken input or not. One possible reason for attachment of DCI information to EMMA documents would be for logging what the conditions were when a particular input was received. For this case, the emma:info element can be used as a container for an xml serialization of system and environment information accessed through the DCI. 2.2 User factors How does the Emma group plan to address user capabilities. ... At the Emma input level or somewhere else in the system? Example: I may have a hearing impairment changing the situation for me over another person. If multiple people are accessing a system it may be important to address the user and their specific capabilities for adaptive response. RESPONSE: Along with system and environment factors, and device factors, user preferences for e.g. choice of mode, volume level etc are intended to be accessed using the DCI: http://www.w3.org/TR/2005/WD-DPF-20051111/ The preferences for a specific user should be queried based on the user's id from the DCI and then those preferences used by the interaction manager to adapt the interaction. The EMMA group discussed the possiblitity of having an explicit user-id annotation and EMMA and concluded that this information is frequently provided explicitly by the user as an input and therefore is application data and so should not be standardized in EMMA. Typically user ids will come from entering a value in a form and this will be submitted as a user input. This will either be done directly from XHTML or perhaps in some cases enclosed in an EMMA message (e.g. if the user id is specified by voice). The id may also come from a cookie, or be determined based on the user's phone number of other more detailed info from a mobile provider. In all of these cases, the user id (and other information such as authentication) is not an annotation of a user input. A user id may be transmitted as the payload of a piece of EMMA markup, as application data inside emma:interpretation but will not be encoded as an emma annotation. Again for logging purposes, the user id or information describing the user could be stored within emma:info. 3. Settling time How does this technology address settling time and multiple keys being hit. People with mobility impairments may push more than one key, inadvertently hit specific keys, or experience tremors whereby it needs to be smoothed. This may or may not effect confidence factors but again the "why" question comes up. This information may need to be processed in the drivers. RESPONSE: The issue appears to be at a different level from EMMA. In many cases this will be a matter of the driver used for the keyboard input device. In the case where keyboard input is used to fill a field in a form, and then it is sent when the user hits return or a SEND/GO button then any editing or correction takes place before the input is sent and the interaction manager would only see the final string. If there is a more complex direct interface from the keystrokes to the interaction manager (each keystroke being sent individually) then details regarding the nature of the keyboard input could be encoded in the application semantics. 4. Directional information Should we have an emma:directional information? Examples are right, left, up, down, end, top, north, south, east, west, next, previous. These could be used to navigate a menu with arrow keys, voice reco, etc. They could be used to navigate a map also. This addresses device independence. This helps with intent-based events. We should include into and out of to address navigation up and down the hierarchy of a document as in DAISY. The device used to generate this information should be irrelevant. Start, Stop, reduce speed, may also be an addition. These higher levels of navigation may be used to control a media player independent of the device. RESPONSE: Specific intents such as up down left right etc are part of the application semantics and so are not standardized as part of EMMA. EMMA provides containers for the representation of intents and a way to specifiy various kinds of annotations on those intents but it is outwith the scope of EMMA to standardize the semantic representation of user intents. 5. Zoom: What about Zoom out? RESPONSE: In order to clarify the example we will change the speech from 'zoom' to 'zoom in'. Zoom out is of course another possible command but this is intended here as an example rather than an exhaustive presentation of map manipulation commands. 6. Device independence and keyboard equivalents For the laptop/desktop class of client devices, there has been a "safe haven" input channel provided by the keyboard interface. Users who cannot control other input methods have assistive technologies that at least emulate the keyboard, and so full command of applications is required from the keyboard. Compare with Checkpoints 1.1 and 1.2 of the User Agent Accessibility Guidelines 1.0 [UAAG10]. [UAAG10] http://www.w3.org/TR/UAAG10-TECHS/guidelines.html#gl-device-independence How does this MMI Framework support having the User Agent supply the user with alternate input bindings for un-supported modalities expected by the application? How will applications developed in this MMI Framework (EMMA applications) meet the "full functionality from keyboard" requirement, or what equivalent facilitation is supported? RESPONSE: The general principle of allowing people to interact more flexibly depending on needs and device capabilities, is part of the broader work in the MMI group on multimodal architecture and interfaces. EMMA is at a different level. EMMA provides a standardized markup for containing and annotating interpretations of particular user inputs. It does not standardize the authoring of the logic of the application. At the architecture level this is likely to be a matter of specifying best practices for multimodal application authoring. There is a need for best practices as different levels. On one level there should be best practices for the design of multimodal applications so that they can support a broad range of modalities and tailor the interaction (timeouts etc) on the basis of annotations (e.g medium, mode) and information from the DCI. At another, more pragmatic, level of best practices multimodal applications should be designed so that in addition to support new modalities such as speech they also support keyboard and mouse so that assistive devices which emulate keyboard and/or mouse input can be used to interact with these applications. One principle would be that verbal inputs such as speech and handwriting have 'alternate bindings' to keyboard input fields. Another would be that if an application supports pointing using a device such as a pen or touchscreen any mechanism supporting pointing (e.g. pen, touchscreen, trackball) should also support mouse input. 7. Use cases To make things more concrete, we have compiled the following use cases to be investigated by the MMI group as Assistive Technology use cases which might bear requirements beyond the typical mainstream use cases. We are willing to discuss these with you in more detail with the goal of coming to a joint conclusion about their feasibility in EMMA. (a) Input by switch. The user is using an on-screen keyboard and inputs each character by scanning over the rows and columns of the keys and hitting the switch for row and column selection. This takes significantly more time than the average user would take to type in the characters. Would this switch-based input be treated like any keyboard input (keyboard emulation)? If yes, could the author impose time constraints that would be a barrier to the switch user? Or, alternatively, would this use case require device-specific (switch-specific) code? RESPONSE: Imposing time constraints is not something that is done by EMMA rather it is a matter of interaction management. In this particular case we think such constraints are unlikely, general fields for keyboard input do not 'time out'. If a switch was being used to generate substitute speech input then there could be a problem with timeouts (in fact probably a problem for almost any keyboard input). Again this maybe a matter of best practices and the best practice should be that when speech input is supported, keyboard input should also be supported, and for the keyboard input there should be no timeout. (b) Word prediction. Is there a way for word prediction programs to communicate with the interaction manager (or other pertinent components of the framework) in order to find out about what input is expected from the user? For example, could a grammar that is used for parsing be passed on to a word prediction program in the front end? RESPONSE: Again this certainly lies outside the scope of EMMA, since EMMA does not define grammar formats or interaction management. The W3C SRGS grammar specification, from the Voice Browser working group could potentially be used by a word prediction system. (c) User overwrites default output parameters. For example, voice output could be described in an application with EMMA and SSML. Can the user overwrite (slow down or speed up) the speech rate of the speech output? RESPONSE: EMMA is solely used for the representation of user inputs and so does not address voice output. Within the MMI framework the way to achieve this would be to specify the user preference for speech output rate in the DCI and have the interaction manager query the DCI in order to determine the speech rate. The voice modality component is then responsible for honoring users' preferences regarding speech including dynamic changes. The working group responsible for this component is the Voice Browser working group and requirements for this mechanism should be raised there. (d) WordAloud (http://www.wordaloud.co.uk/). This is a program that displays text a word at a time, in big letters on the screen, additionally with speech output. How could this special output modality be accommodated with EMMA? RESPONSE: EMMA is solely used for the representation and annotation of user inputs and does not address output. At a later stage the EMMA group maybe address output but at this time the language is solely for input. (e) Aspire Reader (http://www.aequustechnologies.com/), This is a daisy reader and browser that also supports speech output, word highlighting, enhanced navigations, extra text and auditory descriptions that explain the page outline and content as you go, alterative renderings such as following through key points of content and game control type navigation. Alternative texts are for the struggling student (for example a new immigrant) RESPONSE: EMMA is solely used for the representation and annotation of user inputs and does not address output. At a later stage the EMMA group maybe address output but at this time the language is solely for input.
Received on Wednesday, 8 February 2006 15:26:23 UTC