TT LC comment from Multimodal Interaction from Deborah Dahl on 2005-04-25 (public-tt@w3.org from April 2005)

From: Deborah Dahl <dahl@conversational-technologies.com>
Date: Mon, 25 Apr 2005 14:51:26 -0400
To: <public-tt@w3.org>
Cc: "W3C Multimodal group" <w3c-mmi-wg@w3.org>
Message-ID: <000001c549c7$cc942e70$3a7ba8c0@chimaera>

Dear Timed Text Working Group,

In response to your request [1] for Last Call comments on "Timed Text (TT)
Authoring Format 
1.0 - Distribution Format Exchange Profile (DFXP)", the Multimodal
Interaction
Working Group has reviewed the document from our perspective, in particular
considering how timed text might be incorporated into multimodal
applications.

The Multimodal Working Group has not an objection, but an observation
to make about the Timed Text Group's last call working draft.  Timed Text
would be easier to use as part of multimodal interfaces if it had a
means of handling external asynchronous events. Such events are the standard
means of coordinating among modalities in multimodal situations.  Consider 
a multimodal interface that is using Timed Text and text-to-speech
simultaneously 
to prompt the user, while using speech recognition to gather the user's
response. 
Using ttp:timeBase, the text to speech output can be synchronized with the
Timed Text display.  However, when the user starts speaking, the multimodal
interface would normally want to stop the text to speech play and alter, if
not
stop, the Timed Text display to indicate that it is now listening to the
user. 
Obviously, the timing of the user's utterance can't be known in advance,
so the normal way to do this is to generate a 'speech-detected' or
'barge-in'
event, which is then delivered to all the modalities where it is caught
by appropriate event handlers. (The event handler for text to speech would 
halt the current text to speech play.  A corresponding handler for Timed
Text
might flash the display or halt it or make it change colors.)  In the
current 
specification, there is no apparent way to handle this event in Timed Text
markup.
This gap does not indicate an inherent weakness in the Timed Text
specification,
but we think that it will limit the usefulness of Timed Text in multimodal
interfaces. 

If you would like more information about the overall multimodal architecture
that
we're envisioning as a potential container for timed text, you may find our 
MMI Architecture document useful [3].

We would be happy to discuss our observation in more detail if you have any
questions or comments.

best regards,

Debbie Dahl, MMI WG Chair

[1] Request for Last Call Comments:
http://lists.w3.org/Archives/Member/chairs/2005JanMar/0118.html
[2] TT Authoring Format: http://www.w3.org/TR/2005/WD-ttaf1-dfxp-20050321/
[3] MMI Architecture: http://www.w3.org/TR/2005/WD-mmi-arch-20050422/

Received on Monday, 25 April 2005 18:51:46 UTC