Re: Critical missing feature in SSML specification from Richard Schwerdtfeger on 2003-01-30 (www-voice@w3.org from January to March 2003)

From: Richard Schwerdtfeger <schwer@us.ibm.com>
Date: Thu, 30 Jan 2003 12:29:53 -0600
To: Al Gilman <asgilman@iamdigex.net>
Cc: w3c-wai-pf@w3.org, w3c-wai-pf-request@w3.org, www-voice@w3.org
Message-ID: <OFABB220EC.4123E78A-ON86256CBE.0064D253-86256CBE.0065828C@us.ibm.com>
I spoke with Raman today. Due to the need to process various languages at
the speech application level the "stop" should be handled in a control
buffering layer at the application rather than being sent to the speech
server using SSML. The reason is that appropriate buffering is dependent
upon language and context for which the speech server is not aware. The
same is the case for pause/resume.

There should be an ability to cancel speech but this is not something that
SSML should do.

Rich


Rich Schwerdtfeger
STSM, Software Group Accessibility Strategist
Emerging Internet Technologies
Chair, IBM Accessibility Architecture Review  Board
schwer@us.ibm.com, Phone: 512-838-4593,T/L: 678-4593

"Two roads diverged in a wood, and I -
I took the one less traveled by, and that has made all the difference.",
Frost



                                                                                                                                        
                      Al Gilman                                                                                                         
                      <asgilman@iamdige        To:       Richard Schwerdtfeger/Austin/IBM@IBMUS, www-voice@w3.org                       
                      x.net>                   cc:       w3c-wai-pf@w3.org                                                              
                      Sent by:                 Subject:  Re: Critical missing feature in SSML specification                             
                      w3c-wai-pf-reques                                                                                                 
                      t@w3.org                                                                                                          
                                                                                                                                        
                                                                                                                                        
                      01/30/2003 11:10                                                                                                  
                      AM                                                                                                                
                                                                                                                                        







I am going to attempt a summary for this thread.

** executive summary

No change in SSML is indicated, additional specification may be appropriate
in the next generation of the framework (family of specifications).  An
existing Recommendation captures the essence of the requirement, however.

'Stop' is an event or command message flowing upstream from the UI to the
speech production activity.

SSML is an encoding for information flowing downstream from the dialog
resource repository to the speech production activity.  The 'stop'
processing
command is thus not appropriate to implement in the SSML markup vocabulary.

Specification of machine access to the voice browser processes is one of
the
topics that has already come up in the consultations between the Voice
Browser group and the WAI/PF group.  We plan to be reviewing results of
this
dialog as requirements for the next generation Voice Browser Framework at
the Technical Plenary Week.

This 'stop' event or control capability and its restart implications would
logically come under the specification of this machine level of access to
the Voice Browser User Agent.

There are already requirements articulated for Web User Agents in a W3C
Recommendation known as the User Agent Accessibility Guidelines 1.0
<http://www.w3.org/TR/UAAG10/>.
While the implementation report for these guidelines has focused on base
and assistive technology resident on a client node, these guidelines should
by default be considered to apply already to SSML processors except where
demonstrably inapprpriate.

Where direct application of these guidelines is technically infeasible,
equivalent facilitation (access to comparable outcomes) should be
investigated
before dismissing the reference as inapplicable.

See for example

  http://www.w3.org/TR/UAAG10/guidelines.html#tech-control-multimedia

Requirements for how much the Voice Browsing User Agent should be
unbundleable into processes running on different network nodes, and how to
support the required level of distribution, have not been studied in either
the WAI or the Voice Browser activities to a sufficient degree to
standardize requirements for how such a 'stop' capability should be
supported in the speech production module [a.k.a. TTS process] of the Voice
Browser or a Multimodal Browser using TTS and SSML.

**

1.  The user who is using a speech transcription of text as a display needs

to be
able to stop the speech.  Intelligent resumption is desired.

Janina sketched some of the process control transitions and Glen Shires
concurred.

Basic requirements are set out in the UAAG
   http://www.w3.org/TR/UAAG10/

Another good reference model to bookmark in this regard is the user control
of play model for the Z39-86 Standard Talking Book.

: Document Navigation Features List
: http://www.loc.gov/nls/z3986/background/navigation.htm

In particular, note that in this application the 'jump forward' i.e.
'escape' function
relates to a subset of the syntactic container types in the content
markup.  When
escaping from a table, one does not merely escape from the XML entity, nor
escape from
the whole chapter, but to the end of the table.

2. The issues of chunking for distributed processing (where to resume from,
where to jump forward to, streaming) will come up again naturally in the
Timed Text application space.

  W3C Timed Text Home page
  http://www.w3.org/AudioVideo/TT/

There is probably a dependency here, to align what is done by way of Web
Service Port
Types for access to the Voice Browsing Process with the Timed Text
representation
specifications coming out of that work item.

Similarly there are already on the books work items in Multimodal
Interaction for
how speech integrates into multimodal delivery contexts, and work in Device

Independence
for how content can be ready for different delivery contexts.

Al

At 12:37 PM 2003-01-29, Richard Schwerdtfeger wrote:





>In reviewing the SSML specification we (PF Group) overlooked an extremely
>critical missing feature in the last call draft.
>
>It is absolutely essential that SSML support a <STOP> command.
>
>Scenario:
>
>Screen reader users will often hit the stop command to tell the speech
>synthesizer to stop speaking. Screen Readers would use the <MARK>
>annotation as a way to have the speech engine tell the screen reader when
>speech has been processed (marker processed). In the event that the user
>tells the screen reader to stop speaking the screen reader should be able
>to send a stop command to the speech engine which would utltimately flush
>the speech buffers. Markers not returned would help the screen reader know
>where the user left off in the user interface (maintain point of regard
>relative to what has been spoken).
>
>I apologize for not submitting this in our last call review but this is a
>hard requirement. Otherwise, we SSML cannot support screen readers.
>
>Rich
>
>Rich Schwerdtfeger
>STSM, Software Group Accessibility Strategist
>Emerging Internet Technologies
>Chair, IBM Accessibility Architecture Review  Board
>schwer@us.ibm.com, Phone: 512-838-4593,T/L: 678-4593
>
>"Two roads diverged in a wood, and I -
>I took the one less traveled by, and that has made all the difference.",
>Frost
Received on Thursday, 30 January 2003 13:29:58 UTC