- From: Daniel Weck via cvs-syncmail <cvsmail@w3.org>
- Date: Tue, 14 Feb 2012 01:17:12 +0000
- To: public-css-commits@w3.org
Update of /sources/public/csswg/css3-speech In directory hutz:/tmp/cvs-serv369 Modified Files: Overview.html Overview.src.html Log Message: finished SSML relationship prose fixes. Index: Overview.html =================================================================== RCS file: /sources/public/csswg/css3-speech/Overview.html,v retrieving revision 1.99 retrieving revision 1.100 diff -u -d -r1.99 -r1.100 --- Overview.html 14 Feb 2012 00:37:34 -0000 1.99 +++ Overview.html 14 Feb 2012 01:17:10 -0000 1.100 @@ -436,8 +436,8 @@ However, the specificities of the CSS model mean that compatibility with SSML in terms of syntax and/or semantics is only partially achievable. The definition of each property in the Speech module includes informative - statements, wherever necessary, to clarify the relationship with similar - features in SSML. + statements, wherever necessary, to clarify their relationship with similar + functionality from SSML. <h2 id=css-values><span class=secno>4. </span>CSS values</h2> @@ -1206,7 +1206,8 @@ property is similar to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_break"><code>break</code> element</a> from the SSML markup language <a href="#SSML" - rel=biblioentry>[SSML]<!--{{!SSML}}--></a>, the application of prosodic + rel=biblioentry>[SSML]<!--{{!SSML}}--></a>, the application of ‘<a + href="#pause"><code class=property>pause</code></a>’ prosodic boundaries within the <a href="#aural-model">aural "box" model</a> of CSS Speech requires special considerations (e.g. <a href="#collapsed-pauses">"collapsed" pauses</a>). @@ -1482,11 +1483,15 @@ that occurs before (or after) the speech synthesis rendition of an element within the <a href="#aural-model">audio "box" model</a>. - <p class=note> Note that the functionality provided by this property is - related to the <a + <p class=note> Note that although the functionality provided by this + property is similar to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_break"><code>break</code> element</a> from the SSML markup language <a href="#SSML" - rel=biblioentry>[SSML]<!--{{!SSML}}--></a>. + rel=biblioentry>[SSML]<!--{{!SSML}}--></a>, the application of ‘<a + href="#rest"><code class=property>rest</code></a>’ prosodic + boundaries within the <a href="#aural-model">aural "box" model</a> of CSS + Speech requires special considerations (e.g. interspersed audio cues, + additive adjacent rests). <dl> <dt> <strong><time></strong> @@ -1683,11 +1688,15 @@ clips) to be played before (or after) the selected element within the <a href="#aural-model">audio "box" model</a>. - <p class=note> Note that the functionality provided by this property is - related to the <a + <p class=note> Note that although the functionality provided by this + property may appear related to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_audio"><code>audio</code> element</a> from the SSML markup language <a href="#SSML" - rel=biblioentry>[SSML]<!--{{!SSML}}--></a>. + rel=biblioentry>[SSML]<!--{{!SSML}}--></a>, there are in fact major + discrepancies. For example, the <a href="#aural-model">aural "box" + model</a> means that audio cues are associated to the selected element's + volume level, and CSS Speech's auditory icons provide limited + functionality compared to SSML's <code>audio</code> element. <dl> <dt> <strong><uri></strong> @@ -1936,11 +1945,14 @@ <p> <strong><generic-voice></strong> = [<age>? <gender> <integer>?] - <p class=note> Note that the functionality provided by this property is - related to the <a + <p class=note> Note that although the functionality provided by this + property is similar to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_voice"><code>voice</code> element</a> from the SSML markup language <a href="#SSML" - rel=biblioentry>[SSML]<!--{{!SSML}}--></a>. + rel=biblioentry>[SSML]<!--{{!SSML}}--></a>, CSS Speech does not provide an + equivalent to SSML's sophisticated voice language selection. This + technical limitation may be alleviated in a future revision of the Speech + module. <dl> <dt> <strong><name></strong> @@ -1986,24 +1998,14 @@ <p> Possible values are ‘<code class=property>child</code>’, ‘<code class=property>young</code>’ and ‘<code class=property>old</code>’, indicating the preferred age category - to match during voice selection. The mapping with <a href="#SSML" - rel=biblioentry>[SSML]<!--{{!SSML}}--></a> ages is defined as follows: - ‘<code class=property>child</code>’ = 6 y/o, ‘<code - class=property>young</code>’ = 24 y/o, ‘<code - class=property>old</code>’ = 75 y/o (note that more flexible age - ranges may be used by the processor-dependent voice-matching algorithm). - </p> + to match during voice selection.</p> - <p class=note> Note that the interpretation of the relationship between a - person's age and a recognizable type of voice cannot realistically be - defined in a universal manner, as it effectively depends on numerous - criteria (cultural, linguistic, biological, etc.). The values provided - by this specification therefore represent a simplified model that can be - reasonably applied to a broad variety of speech contexts, albeit at the - cost of a certain degree of approximation. Future versions of this - specification may refine the level of precision of the voice-matching - algorithm, as speech processor implementations become more standardized. - </p> + <p class=note> Note that a recommended mapping with <a href="#SSML" + rel=biblioentry>[SSML]<!--{{!SSML}}--></a> ages is: ‘<code + class=property>child</code>’ = 6 y/o, ‘<code + class=property>young</code>’ = 24 y/o, ‘<code + class=property>old</code>’ = 75 y/o. More flexible age ranges may + be used by the processor-dependent voice-matching algorithm.</p> <dt> <strong><gender></strong> @@ -2013,6 +2015,17 @@ class=property>neutral</code>’, specifying a male, female, or neutral voice, respectively.</p> + <p class=note> Note that the interpretation of the relationship between a + person's age or gender, and a recognizable type of voice, cannot + realistically be defined in a universal manner as it effectively depends + on numerous criteria (cultural, linguistic, biological, etc.). The + functionality provided by this specification therefore represent a + simplified model that can be reasonably applied to a broad variety of + speech contexts, albeit at the cost of a certain degree of + approximation. Future versions of this specification may refine the + level of precision of the voice-matching algorithm, as speech processor + implementations become more standardized.</p> + <dt> <strong><integer></strong> <dd> @@ -2184,11 +2197,14 @@ class=property>voice-rate</code></a>’ property manipulates the rate of generated synthetic speech in terms of words per minute. - <p class=note> Note that the functionality provided by this property is - related to the <a + <p class=note> Note that although the functionality provided by this + property is similar to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>rate</code> attribute of the <code>prosody</code> element</a> from the SSML markup - language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>. + language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>, there + are notable discrepancies. For example, CSS Speech rate keywords and + percentage modifiers are not mutually-exclusive, due to how values are + inherited and combined for selected elements. <dl> <dt> <strong>normal</strong> @@ -2323,11 +2339,15 @@ pitch of the output). For example, the common pitch for a male voice is around 120Hz, whereas it is around 210Hz for a female voice. - <p class=note> Note that the functionality provided by this property is - related to the <a + <p class=note> Note that although the functionality provided by this + property is similar to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>pitch</code> attribute of the <code>prosody</code> element</a> from the SSML markup - language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>. + language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>, there + are notable discrepancies. For example, CSS Speech pitch keywords and + relative changes (frequency, semitone or percentage) are not + mutually-exclusive, due to how values are inherited and combined for + selected elements. <dl> <dt> <strong><frequency></strong> @@ -2483,11 +2503,15 @@ to convey meaning and emphasis in speech. Typically, a low range produces a flat, monotonic voice, whereas a high range produces an animated voice. - <p class=note> Note that the functionality provided by this property is - related to the <a + <p class=note> Note that although the functionality provided by this + property is similar to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>range</code> attribute of the <code>prosody</code> element</a> from the SSML markup - language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>. + language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>, there + are notable discrepancies. For example, CSS Speech pitch range keywords + and relative changes (frequency, semitone or percentage) are not + mutually-exclusive, due to how values are inherited and combined for + selected elements. <dl> <dt> <strong><frequency></strong> @@ -2689,7 +2713,7 @@ spoken. <p class=note> Note that the functionality provided by this property is - related to the <a + similar to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_emphasis"><code>emphasis</code> element</a> from the SSML markup language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>. @@ -2817,7 +2841,7 @@ property). <p class=note> Note that the functionality provided by this property is - related to the <a + similar to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>duration</code> attribute of the <code>prosody</code> element</a> from the SSML markup language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>. @@ -2916,7 +2940,7 @@ unlikely to be recognized by the synthesizer. The ‘<a href="#content-def"><code class=property>content</code></a>’ property can be used to replace one string by another. The functionality - provided by this property is related to the <a + provided by this property is similar to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_sub"><code>alias</code> attribute of the <code>sub</code> element</a> from the SSML markup language <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a>. @@ -3002,11 +3026,11 @@ <p> Additionally, an attribute-based mechanism can be used within the markup to author text-pronunciation associations. At the time of writing, such mechanism isn't formally defined in the W3C HTML standard(s). - However, the <a href="http://idpf.org/epub/30">EPUB 3.0 draft - specification</a> allows (x)HTML5 documents to contain attributes derived - from the <a href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a> - specification, that describe how to pronounce text based on a particular - phonetic alphabet.</p> + However, the <a href="http://idpf.org/epub/30">EPUB 3.0 specification</a> + allows (x)HTML5 documents to contain attributes derived from the <a + href="#SSML" rel=biblioentry>[SSML]<!--{{!SSML}}--></a> specification, + that describe how to pronounce text based on a particular phonetic + alphabet.</p> <!-- p> One avenue to explore is the use CSS to "bind" HTML text with a phoneme (also declared in the HTML document). This would maintain a Index: Overview.src.html =================================================================== RCS file: /sources/public/csswg/css3-speech/Overview.src.html,v retrieving revision 1.101 retrieving revision 1.102 diff -u -d -r1.101 -r1.102 --- Overview.src.html 14 Feb 2012 00:37:34 -0000 1.101 +++ Overview.src.html 14 Feb 2012 01:17:10 -0000 1.102 @@ -192,8 +192,8 @@ described in the Speech Synthesis Markup Language (SSML) Version 1.1 [[!SSML]]. However, the specificities of the CSS model mean that compatibility with SSML in terms of syntax and/or semantics is only partially achievable. The definition of each property in the Speech module - includes informative statements, wherever necessary, to clarify the relationship with similar - features in SSML.</p> + includes informative statements, wherever necessary, to clarify their relationship with + similar functionality from SSML.</p> <h2 id="css-values">CSS values</h2> @@ -854,9 +854,9 @@ cue within the <a href="#aural-model">aural "box" model</a>.</p> <p class="note"> Note that although the functionality provided by this property is similar to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_break"><code>break</code> - element</a> from the SSML markup language [[!SSML]], the application of prosodic boundaries - within the <a href="#aural-model">aural "box" model</a> of CSS Speech requires special - considerations (e.g. <a href="#collapsed-pauses">"collapsed" pauses</a>). </p> + element</a> from the SSML markup language [[!SSML]], the application of 'pause' prosodic + boundaries within the <a href="#aural-model">aural "box" model</a> of CSS Speech requires + special considerations (e.g. <a href="#collapsed-pauses">"collapsed" pauses</a>). </p> <dl> <dt> <strong><time></strong> @@ -1086,9 +1086,11 @@ <p>The 'rest-before' and 'rest-after' properties specify a prosodic boundary (silence with a specific duration) that occurs before (or after) the speech synthesis rendition of an element within the <a href="#aural-model">audio "box" model</a>. </p> - <p class="note"> Note that the functionality provided by this property is related to the <a - href="http://www.w3.org/TR/speech-synthesis11/#edef_break"><code>break</code> element</a> - from the SSML markup language [[!SSML]]. </p> + <p class="note"> Note that although the functionality provided by this property is similar to + the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_break"><code>break</code> + element</a> from the SSML markup language [[!SSML]], the application of 'rest' prosodic + boundaries within the <a href="#aural-model">aural "box" model</a> of CSS Speech requires + special considerations (e.g. interspersed audio cues, additive adjacent rests). </p> <dl> <dt> <strong><time></strong> @@ -1282,9 +1284,12 @@ <p>The 'cue-before' and 'cue-after' properties specify auditory icons (i.e. pre-recorded / pre-generated sound clips) to be played before (or after) the selected element within the <a href="#aural-model">audio "box" model</a>.</p> - <p class="note"> Note that the functionality provided by this property is related to the <a - href="http://www.w3.org/TR/speech-synthesis11/#edef_audio"><code>audio</code> element</a> - from the SSML markup language [[!SSML]]. </p> + <p class="note"> Note that although the functionality provided by this property may appear + related to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_audio" + ><code>audio</code> element</a> from the SSML markup language [[!SSML]], there are in fact + major discrepancies. For example, the <a href="#aural-model">aural "box" model</a> means that + audio cues are associated to the selected element's volume level, and CSS Speech's auditory + icons provide limited functionality compared to SSML's <code>audio</code> element. </p> <dl> <dt> <strong><uri></strong> @@ -1502,9 +1507,11 @@ this topic). </p> <p> <strong><generic-voice></strong> = [<age>? <gender> <integer>?] </p> - <p class="note"> Note that the functionality provided by this property is related to the <a - href="http://www.w3.org/TR/speech-synthesis11/#edef_voice"><code>voice</code> element</a> - from the SSML markup language [[!SSML]]. </p> + <p class="note"> Note that although the functionality provided by this property is similar to + the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_voice"><code>voice</code> + element</a> from the SSML markup language [[!SSML]], CSS Speech does not provide an + equivalent to SSML's sophisticated voice language selection. This technical limitation may be + alleviated in a future revision of the Speech module. </p> <dl> <dt> <strong><name></strong> @@ -1534,17 +1541,10 @@ </dt> <dd> <p> Possible values are 'child', 'young' and 'old', indicating the preferred age category to - match during voice selection. The mapping with [[!SSML]] ages is defined as follows: - 'child' = 6 y/o, 'young' = 24 y/o, 'old' = 75 y/o (note that more flexible age ranges may - be used by the processor-dependent voice-matching algorithm). </p> - <p class="note"> Note that the interpretation of the relationship between a person's age and - a recognizable type of voice cannot realistically be defined in a universal manner, as it - effectively depends on numerous criteria (cultural, linguistic, biological, etc.). The - values provided by this specification therefore represent a simplified model that can be - reasonably applied to a broad variety of speech contexts, albeit at the cost of a certain - degree of approximation. Future versions of this specification may refine the level of - precision of the voice-matching algorithm, as speech processor implementations become more - standardized. </p> + match during voice selection. </p> + <p class="note"> Note that a recommended mapping with [[!SSML]] ages is: 'child' = 6 y/o, + 'young' = 24 y/o, 'old' = 75 y/o. More flexible age ranges may be used by the + processor-dependent voice-matching algorithm. </p> </dd> <dt> <strong><gender></strong> @@ -1552,6 +1552,14 @@ <dd> <p> One of the keywords 'male', 'female', or 'neutral', specifying a male, female, or neutral voice, respectively. </p> + <p class="note"> Note that the interpretation of the relationship between a person's age or + gender, and a recognizable type of voice, cannot realistically be defined in a universal + manner as it effectively depends on numerous criteria (cultural, linguistic, biological, + etc.). The functionality provided by this specification therefore represent a simplified + model that can be reasonably applied to a broad variety of speech contexts, albeit at the + cost of a certain degree of approximation. Future versions of this specification may + refine the level of precision of the voice-matching algorithm, as speech processor + implementations become more standardized. </p> </dd> <dt> <strong><integer></strong> @@ -1574,6 +1582,7 @@ name, gender, age). </p> </dd> </dl> + <div class="example"> <p> Examples of invalid declarations: </p> <pre> @@ -1696,9 +1705,12 @@ </table> <p>The 'voice-rate' property manipulates the rate of generated synthetic speech in terms of words per minute.</p> - <p class="note"> Note that the functionality provided by this property is related to the <a - href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>rate</code> attribute of - the <code>prosody</code> element</a> from the SSML markup language [[!SSML]]. </p> + <p class="note"> Note that although the functionality provided by this property is similar to + the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>rate</code> + attribute of the <code>prosody</code> element</a> from the SSML markup language [[!SSML]], + there are notable discrepancies. For example, CSS Speech rate keywords and percentage + modifiers are not mutually-exclusive, due to how values are inherited and combined for + selected elements. </p> <dl> <dt> <strong>normal</strong> @@ -1828,9 +1840,12 @@ processors (it approximately corresponds to the average pitch of the output). For example, the common pitch for a male voice is around 120Hz, whereas it is around 210Hz for a female voice.</p> - <p class="note"> Note that the functionality provided by this property is related to the <a - href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>pitch</code> attribute of - the <code>prosody</code> element</a> from the SSML markup language [[!SSML]]. </p> + <p class="note"> Note that although the functionality provided by this property is similar to + the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>pitch</code> + attribute of the <code>prosody</code> element</a> from the SSML markup language [[!SSML]], + there are notable discrepancies. For example, CSS Speech pitch keywords and relative changes + (frequency, semitone or percentage) are not mutually-exclusive, due to how values are + inherited and combined for selected elements. </p> <dl> <dt> <strong><frequency></strong> @@ -1974,9 +1989,13 @@ example when variations in inflection are used to convey meaning and emphasis in speech. Typically, a low range produces a flat, monotonic voice, whereas a high range produces an animated voice. </p> - <p class="note"> Note that the functionality provided by this property is related to the <a - href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>range</code> attribute of - the <code>prosody</code> element</a> from the SSML markup language [[!SSML]]. </p> + + <p class="note"> Note that although the functionality provided by this property is similar to + the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>range</code> + attribute of the <code>prosody</code> element</a> from the SSML markup language [[!SSML]], + there are notable discrepancies. For example, CSS Speech pitch range keywords and relative + changes (frequency, semitone or percentage) are not mutually-exclusive, due to how values are + inherited and combined for selected elements. </p> <dl> <dt> <strong><frequency></strong> @@ -2166,7 +2185,7 @@ <p>The 'voice-stress' property manipulates the strength of emphasis, which is normally applied using a combination of pitch change, timing changes, loudness and other acoustic differences. The precise meaning of the values therefore depend on the language being spoken. </p> - <p class="note"> Note that the functionality provided by this property is related to the <a + <p class="note"> Note that the functionality provided by this property is similar to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_emphasis"><code>emphasis</code> element</a> from the SSML markup language [[!SSML]]. </p> <dl> @@ -2283,7 +2302,7 @@ are specified, but these must be ignored. In other words, when a 'time' is specified for the 'voice-duration' of a selected element, it applies to the entire element subtree (children cannot override the property). </p> - <p class="note"> Note that the functionality provided by this property is related to the <a + <p class="note"> Note that the functionality provided by this property is similar to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>duration</code> attribute of the <code>prosody</code> element</a> from the SSML markup language [[!SSML]]. </p> <dl> @@ -2359,7 +2378,7 @@ prior to the application of the regular pronunciation rules. This may be used for uncommon abbreviations or acronyms which are unlikely to be recognized by the synthesizer. The 'content' property can be used to replace one string by another. The functionality provided by - this property is related to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_sub" + this property is similar to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_sub" ><code>alias</code> attribute of the <code>sub</code> element</a> from the SSML markup language [[!SSML]]. </p> <div class="example"> @@ -2418,7 +2437,7 @@ to describe such a lexicon.</p> <p> Additionally, an attribute-based mechanism can be used within the markup to author text-pronunciation associations. At the time of writing, such mechanism isn't formally defined - in the W3C HTML standard(s). However, the <a href="http://idpf.org/epub/30">EPUB 3.0 draft + in the W3C HTML standard(s). However, the <a href="http://idpf.org/epub/30">EPUB 3.0 specification</a> allows (x)HTML5 documents to contain attributes derived from the [[!SSML]] specification, that describe how to pronounce text based on a particular phonetic alphabet.</p>
Received on Tuesday, 14 February 2012 01:17:15 UTC