My comments about the Speech Synthesis mark up language

Here follow my comments about the Speech Synthesis mark up language
specification of the Speech Interface Framework, draft dated 3 january
20001.

- 1) paragraph 1.2. It shows that the processing, in different stages, is
influenced both by the mark up support and by not mark-up behaviour.
I suggest to add here as a general rule that "explicit mark up always takes
the precedence over not-mark up behaviour".
This kind of rule in the present version of the document is presented as the
usage note 2 at the end of 2.4, but it is definitely more general than this.

- 2) paragraph 1.2 point 6: waveform production mark up support. 
I do not agree that "the TTS markup does not provide explicit controls over
the generation of the waveforms". In fact with mark-ups already introduced
in point 5 for controlling the prosody you can control both the volume and
the speed. 

- 3) Other than this, always in paragraph 1.2 point 6, it is advisable to
identify if a sentence can or can not be interrupted by a barge-in. This
feature is present in section 4.1.5 of the document "Voice Extensible mark
up language", version 2. Thus poses another more general issue: what is the
relationship between the document "Speech Synthesis mark up language" and
the chapter 4 (System Output) of the Voice XML version 2. It must be
explicitated, taking care not to duplicate the definitions between these
documents in order to simplify the document maintenance.

Best regards

Alberto Ciaramella
CSELT
via Reiss Romoli 274
10148 Torino (Italy)
tel. +39 011 228 6210
fax. +39 011 228 6207



 

-----Messaggio originale-----
Da: Larson, Jim A [mailto:jim.a.larson@intel.com]
Inviato: luned́ 8 gennaio 2001 19.15
A: 'ectf-tgasr@ectf.org'
Oggetto: [ectf-tgasr] New speech specs from W3C


You are invited to review and comment on the following W3C Speech Interface
Framework documents authored by the W3C Voice Browser Working Group. 
Last call working drafts of the Speech Synthesis markup langauge:
http://www.w3.org/TR/speech-synthesis  The Speech Synthesis Markup Language
Specification is part of this set of new markup specifications for voice
browsers, and is designed to provide a rich, XML-based markup language for
assisting the generation of synthetic speech in web and other applications.
The essential role of the markup language is to provide authors of
synthesizable content a standard way to control aspects of speech such as
pronunciation, volume, pitch, rate and etc. across different
synthesis-capable platforms
Last call working draft of the Speech Grammar markup language:
http://www.w3.org/TR/speech-grammar  This document defines syntax for
representating grammars for use in speech recognition so that developers can
specify the words and patterns of words to be listened for by a speech
recognizer. The syntax of the grammar format is presented in two forms, an
augmented BNF syntax and an XML syntax. The specification intends to make
the two representations directly mappable and allow automatic
transformations between the two forms.
Working draft of the the Stochastic Language Models (N-Gram) Specification:
http://www.w3.org/TR/ngram-spec  This document defines syntax for
representing N-Gram (Markovian) stochastic grammars within the W3C Speech
Interface Framework. The use of stochastic N-Gram models has a long and
successful history in the research community and is now more and more
effecting commercial systems, as the market asks for more robust and
flexible solutions. The primary purpose of specifying a stochastic grammar
format is to support large vocabulary and open vocabulary applications. 
We encourage you to subscribe to the public discussion list
<www-voice@w3.org> and to mail in your comments before January 31, 2001. To
subscribe, send an email to www-voice-request@w3. org
<mailto:www-voice-request@w3.org> with the word subscribe in the subject
line (include the word unsubscribe if you want to unsubscribe). A public
archive <http://lists.w3.org/Archives/Public/www-voice/> is available
online.
Regards,
Jim Larson 
W3C Voice Browser Working Group

Received on Monday, 15 January 2001 04:36:02 UTC