- From: Richard Schwerdtfeger <schwer@us.ibm.com>
- Date: Thu, 30 Jan 2003 12:29:53 -0600
- To: Al Gilman <asgilman@iamdigex.net>
- Cc: w3c-wai-pf@w3.org, w3c-wai-pf-request@w3.org, www-voice@w3.org
I spoke with Raman today. Due to the need to process various languages at the speech application level the "stop" should be handled in a control buffering layer at the application rather than being sent to the speech server using SSML. The reason is that appropriate buffering is dependent upon language and context for which the speech server is not aware. The same is the case for pause/resume. There should be an ability to cancel speech but this is not something that SSML should do. Rich Rich Schwerdtfeger STSM, Software Group Accessibility Strategist Emerging Internet Technologies Chair, IBM Accessibility Architecture Review Board schwer@us.ibm.com, Phone: 512-838-4593,T/L: 678-4593 "Two roads diverged in a wood, and I - I took the one less traveled by, and that has made all the difference.", Frost Al Gilman <asgilman@iamdige To: Richard Schwerdtfeger/Austin/IBM@IBMUS, www-voice@w3.org x.net> cc: w3c-wai-pf@w3.org Sent by: Subject: Re: Critical missing feature in SSML specification w3c-wai-pf-reques t@w3.org 01/30/2003 11:10 AM I am going to attempt a summary for this thread. ** executive summary No change in SSML is indicated, additional specification may be appropriate in the next generation of the framework (family of specifications). An existing Recommendation captures the essence of the requirement, however. 'Stop' is an event or command message flowing upstream from the UI to the speech production activity. SSML is an encoding for information flowing downstream from the dialog resource repository to the speech production activity. The 'stop' processing command is thus not appropriate to implement in the SSML markup vocabulary. Specification of machine access to the voice browser processes is one of the topics that has already come up in the consultations between the Voice Browser group and the WAI/PF group. We plan to be reviewing results of this dialog as requirements for the next generation Voice Browser Framework at the Technical Plenary Week. This 'stop' event or control capability and its restart implications would logically come under the specification of this machine level of access to the Voice Browser User Agent. There are already requirements articulated for Web User Agents in a W3C Recommendation known as the User Agent Accessibility Guidelines 1.0 <http://www.w3.org/TR/UAAG10/>. While the implementation report for these guidelines has focused on base and assistive technology resident on a client node, these guidelines should by default be considered to apply already to SSML processors except where demonstrably inapprpriate. Where direct application of these guidelines is technically infeasible, equivalent facilitation (access to comparable outcomes) should be investigated before dismissing the reference as inapplicable. See for example http://www.w3.org/TR/UAAG10/guidelines.html#tech-control-multimedia Requirements for how much the Voice Browsing User Agent should be unbundleable into processes running on different network nodes, and how to support the required level of distribution, have not been studied in either the WAI or the Voice Browser activities to a sufficient degree to standardize requirements for how such a 'stop' capability should be supported in the speech production module [a.k.a. TTS process] of the Voice Browser or a Multimodal Browser using TTS and SSML. ** 1. The user who is using a speech transcription of text as a display needs to be able to stop the speech. Intelligent resumption is desired. Janina sketched some of the process control transitions and Glen Shires concurred. Basic requirements are set out in the UAAG http://www.w3.org/TR/UAAG10/ Another good reference model to bookmark in this regard is the user control of play model for the Z39-86 Standard Talking Book. : Document Navigation Features List : http://www.loc.gov/nls/z3986/background/navigation.htm In particular, note that in this application the 'jump forward' i.e. 'escape' function relates to a subset of the syntactic container types in the content markup. When escaping from a table, one does not merely escape from the XML entity, nor escape from the whole chapter, but to the end of the table. 2. The issues of chunking for distributed processing (where to resume from, where to jump forward to, streaming) will come up again naturally in the Timed Text application space. W3C Timed Text Home page http://www.w3.org/AudioVideo/TT/ There is probably a dependency here, to align what is done by way of Web Service Port Types for access to the Voice Browsing Process with the Timed Text representation specifications coming out of that work item. Similarly there are already on the books work items in Multimodal Interaction for how speech integrates into multimodal delivery contexts, and work in Device Independence for how content can be ready for different delivery contexts. Al At 12:37 PM 2003-01-29, Richard Schwerdtfeger wrote: >In reviewing the SSML specification we (PF Group) overlooked an extremely >critical missing feature in the last call draft. > >It is absolutely essential that SSML support a <STOP> command. > >Scenario: > >Screen reader users will often hit the stop command to tell the speech >synthesizer to stop speaking. Screen Readers would use the <MARK> >annotation as a way to have the speech engine tell the screen reader when >speech has been processed (marker processed). In the event that the user >tells the screen reader to stop speaking the screen reader should be able >to send a stop command to the speech engine which would utltimately flush >the speech buffers. Markers not returned would help the screen reader know >where the user left off in the user interface (maintain point of regard >relative to what has been spoken). > >I apologize for not submitting this in our last call review but this is a >hard requirement. Otherwise, we SSML cannot support screen readers. > >Rich > >Rich Schwerdtfeger >STSM, Software Group Accessibility Strategist >Emerging Internet Technologies >Chair, IBM Accessibility Architecture Review Board >schwer@us.ibm.com, Phone: 512-838-4593,T/L: 678-4593 > >"Two roads diverged in a wood, and I - >I took the one less traveled by, and that has made all the difference.", >Frost
Received on Thursday, 30 January 2003 13:29:58 UTC