- From: Daniel Burnett <burnett@nuance.com>
- Date: Fri, 8 Aug 2003 16:47:22 -0700
- To: <Alberto.Ciaramella@CSELT.IT>
- Cc: <www-voice@w3.org>
Dear Alberto, Thank you for your review of the SSML specification. It's been two years, but we thought it appropriate to send an official response as if you had sent the comment today. If you believe we have not adequately addressed your issues with our responses, please let us know as soon as possible. If we do not hear from you within 14 days, we will take this as tacit acceptance. Again, thank you for your input. -- Dan Burnett Synthesis Team Leader, VBWG [VBWG responses are embedded, preceded by '>>>'] -----Original Message----- From: www-voice-request@w3.org [mailto:www-voice-request@w3.org]On Behalf Of Ciaramella Alberto Sent: Monday, January 15, 2001 1:35 AM To: 'Larson, Jim A'; ectf-tgasr@ectf.org; www-voice@w3.org Subject: My comments about the Speech Synthesis mark up language Here follow my comments about the Speech Synthesis mark up language specification of the Speech Interface Framework, draft dated 3 january 20001. - 1) paragraph 1.2. It shows that the processing, in different stages, is influenced both by the mark up support and by not mark-up behaviour. I suggest to add here as a general rule that "explicit mark up always takes the precedence over not-mark up behaviour". This kind of rule in the present version of the document is presented as the usage note 2 at the end of 2.4, but it is definitely more general than this. >>> Rejected. Behavior in the specification is determined >>> on an element-by-element basis because the markup in some >>> cases might try to do something which the engine knows to >>> be inappropriate. As an example, a prosody contour with >>> sequential pitch targets that vary wildly will not be observed >>> very closely by any commercial engine because the audio would >>> be exceedingly unnatural and likely unintelligible. Additionally, >>> requiring the markup behavior to take precedence would be >>> difficult to enforce without audio checks that measure not >>> just conformance, but performance. We do not believe it is >>> appropriate for the specification to render too fine an opinion >>> on performance. - 2) paragraph 1.2 point 6: waveform production mark up support. I do not agree that "the TTS markup does not provide explicit controls over the generation of the waveforms". In fact with mark-ups already introduced in point 5 for controlling the prosody you can control both the volume and the speed. >>> Accepted. We will remove this sentence. - 3) Other than this, always in paragraph 1.2 point 6, it is advisable to identify if a sentence can or can not be interrupted by a barge-in. This feature is present in section 4.1.5 of the document "Voice Extensible mark up language", version 2. Thus poses another more general issue: what is the relationship between the document "Speech Synthesis mark up language" and the chapter 4 (System Output) of the Voice XML version 2. It must be explicitated, taking care not to duplicate the definitions between these documents in order to simplify the document maintenance. >>> Rejected. We believe this comment has been addressed >>> by changes to both SSML and VoiceXML. Although examples >>> of SSML embedded in other languages are appropriate for >>> this document, specific details are not. Barge-in behavior, >>> for example, is outside the scope of this specification. Best regards Alberto Ciaramella CSELT via Reiss Romoli 274 10148 Torino (Italy) tel. +39 011 228 6210 fax. +39 011 228 6207 -----Messaggio originale----- Da: Larson, Jim A [mailto:jim.a.larson@intel.com] Inviato: luned́ 8 gennaio 2001 19.15 A: 'ectf-tgasr@ectf.org' Oggetto: [ectf-tgasr] New speech specs from W3C You are invited to review and comment on the following W3C Speech Interface Framework documents authored by the W3C Voice Browser Working Group. Last call working drafts of the Speech Synthesis markup langauge: http://www.w3.org/TR/speech-synthesis The Speech Synthesis Markup Language Specification is part of this set of new markup specifications for voice browsers, and is designed to provide a rich, XML-based markup language for assisting the generation of synthetic speech in web and other applications. The essential role of the markup language is to provide authors of synthesizable content a standard way to control aspects of speech such as pronunciation, volume, pitch, rate and etc. across different synthesis-capable platforms Last call working draft of the Speech Grammar markup language: http://www.w3.org/TR/speech-grammar This document defines syntax for representating grammars for use in speech recognition so that developers can specify the words and patterns of words to be listened for by a speech recognizer. The syntax of the grammar format is presented in two forms, an augmented BNF syntax and an XML syntax. The specification intends to make the two representations directly mappable and allow automatic transformations between the two forms. Working draft of the the Stochastic Language Models (N-Gram) Specification: http://www.w3.org/TR/ngram-spec This document defines syntax for representing N-Gram (Markovian) stochastic grammars within the W3C Speech Interface Framework. The use of stochastic N-Gram models has a long and successful history in the research community and is now more and more effecting commercial systems, as the market asks for more robust and flexible solutions. The primary purpose of specifying a stochastic grammar format is to support large vocabulary and open vocabulary applications. We encourage you to subscribe to the public discussion list <www-voice@w3.org> and to mail in your comments before January 31, 2001. To subscribe, send an email to www-voice-request@w3. org <mailto:www-voice-request@w3.org> with the word subscribe in the subject line (include the word unsubscribe if you want to unsubscribe). A public archive <http://lists.w3.org/Archives/Public/www-voice/> is available online. Regards, Jim Larson W3C Voice Browser Working Group
Received on Friday, 8 August 2003 19:52:08 UTC