Re: Critical missing feature in SSML specification from Al Gilman on 2003-01-30 (www-voice@w3.org from January to March 2003)

From: Al Gilman <asgilman@iamdigex.net>
Date: Thu, 30 Jan 2003 12:10:12 -0500
To: Richard Schwerdtfeger <schwer@us.ibm.com>, www-voice@w3.org
Cc: w3c-wai-pf@w3.org
Message-Id: <5.1.0.14.2.20030130110702.0284ab80@pop.iamdigex.net>
I am going to attempt a summary for this thread.

** executive summary

No change in SSML is indicated, additional specification may be appropriate
in the next generation of the framework (family of specifications).  An
existing Recommendation captures the essence of the requirement, however.

'Stop' is an event or command message flowing upstream from the UI to the
speech production activity.

SSML is an encoding for information flowing downstream from the dialog
resource repository to the speech production activity.  The 'stop' processing
command is thus not appropriate to implement in the SSML markup vocabulary.

Specification of machine access to the voice browser processes is one of the
topics that has already come up in the consultations between the Voice
Browser group and the WAI/PF group.  We plan to be reviewing results of this
dialog as requirements for the next generation Voice Browser Framework at
the Technical Plenary Week.

This 'stop' event or control capability and its restart implications would
logically come under the specification of this machine level of access to
the Voice Browser User Agent.

There are already requirements articulated for Web User Agents in a W3C
Recommendation known as the User Agent Accessibility Guidelines 1.0
<http://www.w3.org/TR/UAAG10/>.
While the implementation report for these guidelines has focused on base
and assistive technology resident on a client node, these guidelines should
by default be considered to apply already to SSML processors except where
demonstrably inapprpriate.

Where direct application of these guidelines is technically infeasible,
equivalent facilitation (access to comparable outcomes) should be investigated
before dismissing the reference as inapplicable.

See for example

  http://www.w3.org/TR/UAAG10/guidelines.html#tech-control-multimedia

Requirements for how much the Voice Browsing User Agent should be
unbundleable into processes running on different network nodes, and how to
support the required level of distribution, have not been studied in either
the WAI or the Voice Browser activities to a sufficient degree to
standardize requirements for how such a 'stop' capability should be
supported in the speech production module [a.k.a. TTS process] of the Voice
Browser or a Multimodal Browser using TTS and SSML.

**

1.  The user who is using a speech transcription of text as a display needs 
to be
able to stop the speech.  Intelligent resumption is desired.

Janina sketched some of the process control transitions and Glen Shires 
concurred.

Basic requirements are set out in the UAAG
   http://www.w3.org/TR/UAAG10/

Another good reference model to bookmark in this regard is the user control
of play model for the Z39-86 Standard Talking Book.

: Document Navigation Features List
: http://www.loc.gov/nls/z3986/background/navigation.htm

In particular, note that in this application the 'jump forward' i.e. 
'escape' function
relates to a subset of the syntactic container types in the content 
markup.  When
escaping from a table, one does not merely escape from the XML entity, nor 
escape from
the whole chapter, but to the end of the table.

2. The issues of chunking for distributed processing (where to resume from,
where to jump forward to, streaming) will come up again naturally in the
Timed Text application space.

  W3C Timed Text Home page
  http://www.w3.org/AudioVideo/TT/

There is probably a dependency here, to align what is done by way of Web 
Service Port
Types for access to the Voice Browsing Process with the Timed Text 
representation
specifications coming out of that work item.

Similarly there are already on the books work items in Multimodal 
Interaction for
how speech integrates into multimodal delivery contexts, and work in Device 
Independence
for how content can be ready for different delivery contexts.

Al

At 12:37 PM 2003-01-29, Richard Schwerdtfeger wrote:





>In reviewing the SSML specification we (PF Group) overlooked an extremely
>critical missing feature in the last call draft.
>
>It is absolutely essential that SSML support a <STOP> command.
>
>Scenario:
>
>Screen reader users will often hit the stop command to tell the speech
>synthesizer to stop speaking. Screen Readers would use the <MARK>
>annotation as a way to have the speech engine tell the screen reader when
>speech has been processed (marker processed). In the event that the user
>tells the screen reader to stop speaking the screen reader should be able
>to send a stop command to the speech engine which would utltimately flush
>the speech buffers. Markers not returned would help the screen reader know
>where the user left off in the user interface (maintain point of regard
>relative to what has been spoken).
>
>I apologize for not submitting this in our last call review but this is a
>hard requirement. Otherwise, we SSML cannot support screen readers.
>
>Rich
>
>Rich Schwerdtfeger
>STSM, Software Group Accessibility Strategist
>Emerging Internet Technologies
>Chair, IBM Accessibility Architecture Review  Board
>schwer@us.ibm.com, Phone: 512-838-4593,T/L: 678-4593
>
>"Two roads diverged in a wood, and I -
>I took the one less traveled by, and that has made all the difference.",
>Frost
Received on Thursday, 30 January 2003 12:20:52 UTC