Response to feedback on EMMA from WAI/PF

The Multimodal Interaction Working Group would like to thank the Web
Accessibilities Initiative Protocols and Formats Working Group for
their thoughtful and perceptive comments [1] on the EMMA Last Call Working
Draft [2]. We very much appreciate your taking
the time to provide this feedback and we look forward to continuing to
draw on your expertise to help EMMA (and more generally,
Multimodal Interaction) play a role in improving the accessibility of the
web.
The EMMA subgroup has discussed your comments and has prepared the responses

below. The MMIWG welcomes any further discussion on these comments.

regards,

Debbie Dahl,
W3C MMIWG Chair

[1] http://lists.w3.org/Archives/Public/www-multimodal/2005Dec/0000.html
[2] http://www.w3.org/TR/emma/

RESPONSE TO EMMA FEEDBACK FROM W3C WAI GROUP
===========================================================================

1. We are concerned that in an approach that focuses on input and output
modalities that are "widely used today" Assistive Technology devices might
be left out in practice.  Although theoretically it seems to be possible to
apply EMMA to all types of input and output devices (modalities), including
Assistive Technology, the important question is "Who is going to write the
device-specific code for Assistive Technology devices?"

If this is outside the scope of EMMA, please let us know who we should
address with this question.

RESPONSE:

We share the concern of the WAI group as to whether the
introduction of new protocols such as EMMA could adversely
impact assistive technology, and the EMMA subgroup have
discussed this in some detail in response to your feedback.

EMMA is a markup for the representation and annotation of
user inputs and is intended to enable support for modalities
beyond keyboard and mouse such as speech and pen. As such
EMMA can play an important role in
enabling the representation of user inputs from
assistive technology devices. The EMMA group would greatly
welcome your feedback on classifications on different kinds of
assistive devices that could be used as values of emma:mode.

The broader issue concerns providing support for
assistive technologies while the minimizing the burden on
application developers building multimodal applications.
We see three ways in which assistive devices
may operate with multimodal applications:

1. The application developer building the interaction
manager (IM) for the multimodal application builds it
specifically with support for particular assistive devices.
The IM might for example use different timeouts or
break up the dialog differently depending on the kind of
assistive device in use. In this case the assistive technology
will produce EMMA representation of the user input,
annotated to indicate the kind of device it is from, and
IM will have specific dialog/interaction logic for that device.

2. The application developer does not directly provide support
for the assistive devices but the developer of the
assistive technology provides EMMA as a representation of
the input on the assistive device. For example, for an
application with speech input, the assistive technology would
generate EMMA for the assistive device that looks like a
sequence of words from speech recognition.

3. The third case is more like what we believe is
prevalent today and likely (unfortunately) to remain the case for
most devices where the assistive technology, generally
at an operating system level, serves as an emulator of the
keyboard and/or mouse. In this case, the only way to ensure that
multimodal applications also support assistive devices
is to establish best practices for multimodal application
design. One principle would be that in any case
where the interaction manager expects a verbal input, be it
from speech or handwriting recognition it will also
accept input from the keyboard. Another would be that if
commands can be issued in one mode e.g. gui they can
also be issue in the other e.g. speech (symmetry
among the modes).

Since EMMA does not provide an authoring language for
interaction management or authoring of applications this
lies outwith the scope of the EMMA specification itself.
Within the MMI group this relates most closely to the
multimodal architecture work and work on interaction management.

The EMMA subgroup are starting to compile a list of
best practices for authoring applications that consume
EMMA but see this as better suited to a separate best
practices Note rather than as part of the EMMA specification.



2. Adaptation to delivery context
---------------------------------

2.1 System and Environment

Composite input should provide environmental information. Since input
is used to define a response, the system response should take into
account environmental conditions that should be captured at input
time. Here are some examples:

Signal to Noise Ratio (SNR)
Lighting conditions
Power changes (may throw out input or prompt user to re-enter information)

In the case of a low SNR you might want to change the volume, pitch,
or if the system provides it - captioning. Sustained SNR issues may
result in noise cancellation to improve voice recognition. This
should be included with EMMA structural elements. Some of these
issues could be reflected in confidence but the confidence factor
provides no information as to why the confidence level is low and how
to adapt the system.

RESPONSE:

System and environment issues were initially addressed within the
MMI working group and includes the kinds of information described
above along with other factors such as the location of the device.
That work is now called DCI (Delivery Context Interfaces) and
is has moved to the Device Independence working group:

http://www.w3.org/TR/2005/WD-DPF-20051111/

In the Multimodal architecture work within the MMI group,
DCI (previously DPF) is accessed directly from the interaction
manager, rather than through the annotation of EMMA inputs.

http://www.w3.org/TR/mmi-arch/

We believe it is important for system and environment
information to be accessed directly through DCI from the IM
because the interaction should be able to adjust whether the
user provides an input or not (EMMA is only going to
arrive to the IM when the user makes an input).
For example, if the interaction manager will adapt and use
visual prompts rather than audio when the SNR is beneath
a threshold. This adaption should occur regardless of whether
the user has produced a spoken input or not.

One possible reason for attachment of DCI information
to EMMA documents would be for logging what the conditions
were when a particular input was received. For this case, the
emma:info element can be used as a container for an xml
serialization of system and environment information accessed
through the DCI.


2.2 User factors

How does the Emma group plan to address user capabilities. ... At
the Emma input level or somewhere else in the system? Example: I may
have a hearing impairment changing the situation for me over another
person. If multiple people are accessing a system it may be important
to address the user and their specific capabilities for adaptive
response.

RESPONSE:

Along with system and environment factors, and device factors,
user preferences for e.g. choice of mode, volume level etc
are intended to be accessed using the DCI:

http://www.w3.org/TR/2005/WD-DPF-20051111/

The preferences for a specific user should be queried based
on the user's id from the DCI and then those preferences used
by the interaction manager to adapt the interaction.
The EMMA group discussed the possiblitity of having an
explicit user-id annotation and EMMA and concluded that this
information is frequently provided explicitly by the user as
an input and therefore is application data and so should
not be standardized in EMMA.  Typically user ids will come from entering
a value in a form and this will be submitted as a user input.
This will either be done directly from XHTML or perhaps in
some cases enclosed in an EMMA message
(e.g. if the user id is specified by voice).
The id may also come from a cookie, or be determined
based on the user's phone number of other more detailed
info from a mobile provider. In all of these cases,
the user id (and other information such as authentication) is
not an annotation of a user input.
A user id may be transmitted as the payload of a piece of
EMMA markup, as application data inside emma:interpretation
but will not be encoded as an emma annotation.

Again for logging purposes, the user id or information
describing the user could be stored within emma:info.


3. Settling time

How does this technology address settling time and multiple keys being hit.
People with mobility impairments may push more than one key,
inadvertently hit specific keys, or experience tremors whereby it
needs to be smoothed. This may or may not effect confidence factors
but again the "why" question comes up. This information may need to
be processed in the drivers.

RESPONSE:

The issue appears to be at a different level from EMMA. In many
cases this will be a matter of the driver used for the keyboard
input device. In the case where keyboard input
is used to fill a field in a form, and then it is sent when the
user hits return or a SEND/GO button then any editing or
correction takes place before the input is sent and the
interaction manager would only see the final string.
If there is a more complex direct interface from the
keystrokes to the interaction manager (each keystroke
being sent individually) then details regarding the
nature of the keyboard input could be encoded in the
application semantics.


4. Directional information

Should we have an emma:directional information? Examples are right,
left, up, down, end, top, north, south, east, west, next, previous.
These could be used to navigate a menu with arrow keys, voice reco,
etc. They could be used to navigate a map also. This addresses device
independence. This helps with intent-based events.

We should include into and out of to address navigation up and down
the hierarchy of a document as in DAISY. The device used to generate
this information should be irrelevant. Start, Stop, reduce speed, may
also be an addition. These higher levels of navigation may be used to
control a media player independent of the device.

RESPONSE:

Specific intents such as up down left right etc are part of the
application semantics and so are not standardized as part of EMMA.
EMMA provides containers for the representation of intents and a way to
specifiy various kinds of annotations on those intents but it is
outwith the scope of EMMA to standardize the semantic representation of
user intents.


5. Zoom: What about Zoom out?

RESPONSE:

In order to clarify the example we will change the
speech from 'zoom' to 'zoom in'.  Zoom out is of course
another possible command but this is intended here as
an example rather than an exhaustive presentation of
map manipulation commands.


6. Device independence and keyboard equivalents

For the laptop/desktop class of client devices, there has been a "safe
haven" input channel provided by the keyboard interface.  Users who
cannot control other input methods have assistive technologies that
at least emulate the keyboard, and so full command of applications is
required from the keyboard.  Compare with Checkpoints 1.1 and 1.2 of
the User Agent Accessibility Guidelines 1.0 [UAAG10].

[UAAG10]
http://www.w3.org/TR/UAAG10-TECHS/guidelines.html#gl-device-independence

How does this MMI Framework support having the User Agent supply the
user with alternate input bindings for un-supported modalities
expected by the application?

How will applications developed in this MMI Framework (EMMA
applications) meet the "full functionality from keyboard"
requirement, or what equivalent facilitation is supported?


RESPONSE:

The general principle of allowing people to interact
more flexibly depending on needs and device capabilities,
is part of the broader work in the MMI group on multimodal
architecture and interfaces. EMMA is at a different level.
EMMA provides a standardized markup for containing and
annotating interpretations of particular user inputs.
It does not standardize the authoring of the logic of the
application.  At the architecture level this is likely to
be a matter of specifying best practices for multimodal
application authoring. There is a need for best practices
as different levels. On one level there should be best practices
for the design of multimodal applications so that they can
support a broad range of modalities and tailor the
interaction (timeouts etc) on the basis of annotations
(e.g medium, mode) and information from the DCI.
At another, more pragmatic, level of best practices
multimodal applications should be designed so that in
addition to support new modalities such as speech they
also support keyboard and mouse so that assistive devices
which emulate keyboard and/or mouse input can be used to
interact with these applications. One principle
would be that verbal inputs such as speech and handwriting have
'alternate bindings' to keyboard input fields.
Another would be that if an application supports pointing
using a device such as a pen or touchscreen


any mechanism supporting pointing
(e.g. pen, touchscreen, trackball) should also support
mouse input.



7. Use cases

To make things more concrete, we have compiled the following use cases
to be investigated by the MMI group as Assistive Technology use cases which
might bear requirements beyond the typical mainstream use cases.  We are
willing to discuss these with you in more detail with the goal of coming to
a joint conclusion about their feasibility in EMMA.


(a) Input by switch.  The user is using an on-screen keyboard and inputs
each character by scanning over the rows and columns of the keys and hitting
the switch for row and column selection.  This takes significantly more time
than the average user would take to type in the characters.  Would this
switch-based input be treated like any keyboard input (keyboard emulation)?
If yes, could the author impose time constraints that would be a barrier to
the switch user?  Or, alternatively, would this use case require
device-specific (switch-specific) code?

RESPONSE:

Imposing time constraints is not something that is done by EMMA
rather it is a matter of interaction management. In this particular
case we think such constraints are unlikely, general fields for
keyboard input do not 'time out'. If a switch was being used to
generate substitute speech input then there could be a problem with
timeouts (in fact probably a problem for almost any keyboard input).
Again this maybe a matter of best practices and the best practice
should be that when speech input is supported, keyboard input should
also be supported, and for the keyboard input there should be no timeout.


(b) Word prediction.  Is there a way for word prediction programs to
communicate with the interaction manager (or other pertinent components of
the framework) in order to find out about what input is expected from the
user?  For example, could a grammar that is used for parsing be passed on to
a word prediction program in the front end?

RESPONSE:

Again this certainly lies outside the scope of EMMA, since EMMA
does not define grammar formats or interaction management. The W3C
SRGS grammar specification, from the Voice Browser working group
could potentially be used by a word prediction system.


(c) User overwrites default output parameters.  For example, voice output
could be described in an application with EMMA and SSML.  Can the user
overwrite (slow down or speed up) the speech rate of the speech output?

RESPONSE:

EMMA is solely used for the representation of user inputs and so
does not address voice output. Within the MMI framework the way to achieve
this
would be to specify the user preference for speech output rate
in the DCI and have the interaction manager query the DCI in order to
determine the speech rate. The voice modality component is then responsible
for honoring users' preferences regarding speech including dynamic changes.
The working group responsible for this component is the Voice Browser
working
group and requirements for this mechanism should be raised there.


(d) WordAloud (http://www.wordaloud.co.uk/).  This is a program that
displays text a word at a time, in big letters on the screen, additionally
with speech output.  How could this special output modality be accommodated
with EMMA?

RESPONSE:

EMMA is solely used for the representation and annotation of user inputs
and does not address output. At a later stage the EMMA group maybe address
output but at this time the language is solely for input.

(e) Aspire Reader (http://www.aequustechnologies.com/), This is a
daisy reader and browser that also supports speech output, word
highlighting, enhanced navigations, extra text and auditory
descriptions that explain the page outline and content as you go,
alterative renderings such as following through key points of content
and game control type navigation. Alternative texts are for the
struggling student (for example a new immigrant)


RESPONSE:

EMMA is solely used for the representation and annotation of user inputs
and does not address output. At a later stage the EMMA group maybe address
output but at this time the language is solely for input.

Received on Wednesday, 8 February 2006 15:26:23 UTC