- From: Daniel Weck via cvs-syncmail <cvsmail@w3.org>
- Date: Thu, 26 May 2011 13:43:04 +0000
- To: public-css-commits@w3.org
Update of /sources/public/csswg/css3-speech In directory hutz:/tmp/cvs-serv14081 Modified Files: Overview.html Overview.src.html Log Message: overdue updates (voice-volume and cue loudness now matches SSML11, voice-family age is now a number instead of an enumeration of 3 keywords, to match SSML 1.0 and 1.1). Plus a few other editorial changes. I need to publish the new section on list support (coming soon). Index: Overview.html =================================================================== RCS file: /sources/public/csswg/css3-speech/Overview.html,v retrieving revision 1.51 retrieving revision 1.52 diff -u -d -r1.51 -r1.52 --- Overview.html 11 May 2011 17:13:33 -0000 1.51 +++ Overview.html 26 May 2011 13:43:02 -0000 1.52 @@ -6,8 +6,16 @@ <title>CSS Speech Module</title> <meta content="text/html; charset=utf-8" http-equiv=Content-Type> <link href="../default.css" rel=stylesheet type="text/css"> + <!-- link rel="stylesheet" type="text/css" href="http://www.w3.org/StyleSheets/TR/W3C-ED.css" --> + <link href="http://www.w3.org/StyleSheets/TR/W3C-ED" rel=stylesheet + type="text/css"> <style type="text/css"> + p + { + padding-bottom : 1em; [...1275 lines suppressed...] + <dd>Paolo Baggia. <a + href="http://www.w3.org/TR/2008/REC-pronunciation-lexicon-20081014/"><cite>Pronunciation + Lexicon Specification (PLS) Version 1.0.</cite></a> 14 October 2008. W3C + Recommendation. URL: <a + href="http://www.w3.org/TR/2008/REC-pronunciation-lexicon-20081014/">http://www.w3.org/TR/2008/REC-pronunciation-lexicon-20081014/</a> + </dd> + <!----> + + <dt id=SSML-SAYAS>[SSML-SAYAS] + + <dd>Daniel C. Burnett; et al. <a + href="http://www.w3.org/TR/2005/NOTE-ssml-sayas-20050526"><cite>SSML 1.0 + say-as attribute values.</cite></a> 26 May 2005. W3C Working Group Note. + URL: <a + href="http://www.w3.org/TR/2005/NOTE-ssml-sayas-20050526">http://www.w3.org/TR/2005/NOTE-ssml-sayas-20050526</a> + </dd> + <!----> </dl> <!--end-informative--> </html> Index: Overview.src.html =================================================================== RCS file: /sources/public/csswg/css3-speech/Overview.src.html,v retrieving revision 1.52 retrieving revision 1.53 diff -u -d -r1.52 -r1.53 --- Overview.src.html 11 May 2011 17:13:33 -0000 1.52 +++ Overview.src.html 26 May 2011 13:43:02 -0000 1.53 @@ -4,7 +4,14 @@ <title>CSS Speech Module</title> <meta content="text/html; charset=utf-8" http-equiv="Content-Type" /> <link href="../default.css" rel="stylesheet" type="text/css" /> + <!-- link rel="stylesheet" type="text/css" href="http://www.w3.org/StyleSheets/TR/W3C-WD.css" --> + <link href="http://www.w3.org/StyleSheets/TR/W3C-ED" rel="stylesheet" type="text/css" /> <style type="text/css"> + p + { + padding-bottom : 1em; + } + p + p { text-indent : 0; @@ -71,8 +78,6 @@ font-size : 120% } --> - <!-- link rel="stylesheet" type="text/css" href="http://www.w3.org/StyleSheets/TR/W3C-WD.css" --> - <link href="http://www.w3.org/StyleSheets/TR/W3C-ED" rel="stylesheet" type="text/css" /> </head> <body> <div class="head"> @@ -109,12 +114,12 @@ </div> <h2 class="no-num no-toc" id="abstract">Abstract</h2> <p>CSS (Cascading Style Sheets) is a language that describes the rendering of markup documents - (e.g. HTML, XML) on various supports, such as screen, paper, speech, etc. The Speech Module + (e.g. HTML, XML) on various supports, such as screen, paper, speech, etc. The Speech module defines aural CSS properties that enable authors to declaratively control the rendering of documents via speech synthesis, and using optional audio cues. The feature set exposed by this specification is designed to match the model described by the Speech Synthesis Markup Language - (SSML) Version 1.1 [[!SPEECH-SYNTHESIS]]. Note that this standard was developed in cooperation - with the <a href="http://www.w3.org/Voice/">Voice Browser Activity</a>.</p> + (SSML) Version 1.1 [[!SSML]]. Note that this standard was developed in cooperation with the <a + href="http://www.w3.org/Voice/">Voice Browser Activity</a>.</p> <h2 class="no-num no-toc" id="status">Status of this document</h2> <!--status--> <p> </p> @@ -129,7 +134,7 @@ <p>The CSS WG maintains a separate <a href="http://www.w3.org/Style/CSS/Tracker/products/29" >list of issues</a> for this module.</p> </div> - <p>The CSS Speech Module is a community effort and if you would like to help with implementation + <p>The CSS Speech module is a community effort and if you would like to help with implementation and driving the specification forward along the W3C Recommendation track, please contact the editors.</p> <hr /> @@ -149,25 +154,25 @@ entertainment, helping users to learn reading, or supporting users who have reading difficulties (print disabilities). </p> <p> When it comes to documents, the quality of the speech rendition depends on the structure and - semantics authored within the content itself. The CSS Speech Module provides properties that + semantics authored within the content itself. The CSS Speech module provides properties that enable authors to declaratively control presentational aspects of the aural dimension (e.g. - TTS voice, pitch, rate, volume levels, etc.). These style sheet properties can be used - together with visual properties (mixed media), or as a complete aural alternative to visual + TTS voice, pitch, rate, and volume levels). These style sheet properties can be used together + with visual properties (mixed media), or as a complete aural alternative to a visual presentation. The aural "canvas" consists of a two-channel (stereo) space and of a temporal dimension, within which synthetic speech and audio cues coexist.</p> <h3 id="css21-rel">Relationship with CSS2.1</h3> - <p> The CSS Speech Module is a re-work of the informative CSS2.1 Aural appendix, within which + <p> The CSS Speech module is a re-work of the informative CSS2.1 Aural appendix, within which the "aural" media type was described, but also deprecated (in favor of the "speech" media type). Although the [[!CSS21]] specification reserves the "speech" media type, it doesn't - actually define the corresponding properties. This Module describes the CSS properties that + actually define the corresponding properties. This module describes the CSS properties that apply to the "speech" media type, and defines a new "box" model specifically for the aural dimension. </p> - <p> Note that content creators can conditionally include CSS properties dedicated to user-agents - with text to speech synthesis capabilities, by specifying the "speech" media type via the + <p> Content creators can conditionally include CSS properties dedicated to user-agents with text + to speech synthesis capabilities, by specifying the "speech" media type via the <code>media</code> attribute of the <code>link</code> element, or with the <code>@media</code> at-rule, or within an <code>@import</code> statement. When doing so, the styles authored within the scope of such conditional statements are ignored by user-agents - that do not support speech synthesis. </p> + that do not support this module. </p> <h3 id="example">CSS Speech Example</h3> <p>The following example shows how authors can tell the speech synthesizer to speak HTML headings with a voice called "paul", using "moderate" emphasis (which is more than normal) and @@ -191,6 +196,7 @@ voice-family: female; voice-balance: left; voice-pitch: high; + voice-volume: -6dB; } p.peter { @@ -239,14 +245,14 @@ <td> <em>Value:</em> </td> - <td>silent | [[ x-soft | soft | medium | loud | x-loud ] && linear ] | [ - <non-negative number> && linear ] | <percentage> | inherit</td> + <td>normal | silent | x-soft | soft | medium | loud | x-loud | <decibel> | + inherit</td> </tr> <tr> <td> <em>Initial:</em> </td> - <td>medium</td> + <td>normal</td> </tr> <tr> <td> @@ -281,86 +287,65 @@ </tbody> </table> <p>The 'voice-volume' property manipulates the amplitude of the audio waveform generated by the - speech synthesiser, and is also used to calculate the relative volume level of <a + speech synthesiser, and is also used when calculating the relative volume level of <a href="#cue-props">audio cues</a> within the <a href="#aural-model">audio "box" model</a>. </p> <p class="note"> Note that the functionality provided by this property is related to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>volume</code> attribute - of the <code>prosody</code> element</a> from the SSML markup language [[!SPEECH-SYNTHESIS]]. </p> + of the <code>prosody</code> element</a> from the SSML markup language [[!SSML]]. </p> <dl> <dt> - <strong>silent</strong> + <strong>normal</strong> </dt> <dd> - <p> Specifies that the volume level results in no sound output at all. </p> - <p class="note"> Note that there is a difference between an element whose 'voice-volume' - property has a value of 'silent', and an element whose 'speak' property has the value - 'none'. The former takes up the same time as if it had been spoken, including any pause - before and after the element, but no sound is generated (descendants can override the - 'voice-volume' value and may therefore generate audio output). Conversely, the latter - requires no time and is not rendered in the aural dimension (descendants can override the - 'speak' value and may therefore generate audio output). </p> + <p> Corresponds to +0.0dB, which means that there is no modification to the default volume + level. This value overrides the inherited value.</p> </dd> <dt> - <strong>linear</strong> + <strong>silent</strong> </dt> <dd> - <p> When present, this keyword indicates that the associated value represents a point on a - linear volume amplitude scale, from '0' (silent) to '100' (full volume). Otherwise, the - scale corresponds to monotonically non-decreasing volume levels from '0' (minimum audible) - to '100' (maximum tolerable), with arbitrary intermediary values that depend on the user - environment (see the definition of <non-negative number> below).</p> + <p> Specifies that no sound is generated (the text is read "silently"). Corresponds to + negative infinity in dB units.</p> + <p class="note"> Note that there is a difference between an element whose 'voice-volume' + property has a value of 'silent', and an element whose 'speak' property has the value + 'none'. With the former, the selected takes up the same time as if it had been spoken, + including any pause before and after the element, but no sound is generated (descendants + can override the 'voice-volume' value and may therefore generate audio output). With the + latter, the selected element is not rendered in the aural dimension and no time is + allocated for playback (descendants can override the 'speak' value and may therefore + generate audio output). </p> </dd> <dt><strong>x-soft</strong>, <strong>soft</strong>, <strong>medium</strong>, - <strong>loud</strong>, and <strong>x-loud</strong></dt> - <dd> - <p> This sequence of values corresponds to monotonically non-decreasing volume levels. The - value 'x-soft' maps to 0, 'soft' maps to 25, 'medium' maps to 50, 'loud' maps to 75 and - 'x-loud' maps to 100. The interpretation of the corresponding numerical values depends on - whether the 'linear' keyword is used (see the definition of <non-negative number> - below).</p> - </dd> - <dt> - <strong><non-negative number></strong> - </dt> + <strong>loud</strong>, <strong>x-loud</strong></dt> <dd> - <p>An integer or floating point <a href="#non-negative-number-def">positive number</a> in - the range '0' to '100'. The interpretation of the '0' to '100' scale depends on whether - the 'linear' keyword is used.</p> - <p> When the 'linear' keyword not used, '0' represents the <em>minimum audible</em> level, - '100' corresponds to the <em>maximum tolerable</em> level, and '50' corresponds to the - user's <em>preferred</em> volume level. All 3 values are configured by the user, or at - least predefined by the user-agent. The numerical values on this scale are mapped to - concrete volume levels that depend on the user context, so this allows authors to write a - single style sheet that works in a variety of listening environments. </p> - <p> When the 'linear' keyword is specified, '0' maps to 'silent' and '100' maps to the - maximum possible audio volume output (which depends on the user agent implementation, - device capabilities, etc.). The values in between '0' and '100' are placed on a linear - amplitude scale that do not necessarily match the user's expectations, because it is - independent from the user-configured volume levels. For example, '50' may not correspond - to the user's <em>preferred</em> volume level, and it may actually result in louder or - softer audio output than desired. This feature is provided to maintain compatibility with - SSML (where 'x-soft' always means "silent", etc.). </p> + <p> This sequence of keywords corresponds to monotonically non-decreasing volume levels, + mapped to implementation-dependent values (i.e. inferred by the user-agent) that meet + user's requirements in terms of perceived sound loudness . The keyword 'x-soft' maps to + the user's <em>minimum audible</em> volume level, 'x-loud' maps to the user's <em>maximum + tolerable</em> volume level, 'medium' maps to the user's <em>preferred</em> volume + level, 'soft' and 'loud' map to intermediary values. </p> </dd> <dt> - <strong><percentage></strong> + <strong><decibel></strong> </dt> <dd> - <p> Only positive <a href="#percentage-def">percentage</a> values are allowed. Computed - values are calculated relative to the inherited value, and are then clipped to the range - '0' to '100'. The 'linear' keyword is preserved from the inherited value.</p> - <p class="note"> Note that a leading "+" sign does not denote an increment. For example, - +50% is equivalent to 50%, so the computed value equals the inherited value times 0.5 - (divided by 2), then clipped to [0,100]. </p> + <p>An integer or floating point <a href="#number-def">number</a> immediately followed by + "dB" (decibel unit). This represents a change (positive or negative) relative to the + default or inherited volume level. This is expressed as the ratio of the squares of the + new signal amplitude (a1) and the current amplitude (a0), as per the following logarithmic + equation: volume(dB) = 20 log10 (a1 / a0) </p> + <p class="note"> Note that -6.0dB is approximately half the amplitude of the audio signal, + and +6.0dB is approximately twice the amplitude.</p> </dd> </dl> - <p class="note"> Unless 'linear' is used, the actual volume levels resulting from the use of the - numerical or keyword values depend on various factors, such as the listening environment and - personal user preferences. The effective volume variation between '0' and '100' determines the - dynamic range of the speech output, which is typically compressed in a noisy environment (the - volume corresponding to '0' is nearer the value of '100'), whereas a noise-free context allows - for the full range of volume levels (the gap between '0' and '100' is wider). Conversely, - there may be situations whereby both '0' and '100' are set to low volume levels (for example - when listening discretely at night). </p> + <p class="note">Note that the actual perceived volume levels depend on various factors, such as + the listening environment and personal user preferences. The effective volume variation + between 'x-soft' and 'x-loud' represents the dynamic range (in terms of loudness) of the + speech output. Typically, this range would be compressed in a noisy context, i.e. the + perceived loudness corresponding to 'x-soft' would effectively be closer to 'x-loud' than it + would be in a quiet environment. There may also be situations where both 'x-soft' and 'x-loud' + would map to low volume levels, such as in listening environments requiring discretion (e.g. + library, night-reading). </p> <h3 id="mixing-props-voice-balance">The 'voice-balance' property</h3> <table class="propdef" summary="name: syntax"> <tbody> @@ -417,7 +402,7 @@ <p>The 'voice-balance' property manipulates the distribution of audio output between left and right channels in stereo-capable sound devices.</p> <p class="note"> Note that the functionality provided by this property has no match in the SSML - markup language [[!SPEECH-SYNTHESIS]]. </p> + markup language [[!SSML]]. </p> <dl> <dt> <strong><number></strong> @@ -524,7 +509,7 @@ </table> <p>The 'speak' property determines whether or not to render text aurally.</p> <p class="note"> Note that the functionality provided by this property has no match in the SSML - markup language [[!SPEECH-SYNTHESIS]]. </p> + markup language [[!SSML]]. </p> <dl> <dt> <strong>auto</strong> @@ -619,16 +604,17 @@ basic predefined list of possible values.</p> <p class="note"> Note that the functionality provided by this property is related to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_say-as"><code>say-as</code> element</a> - from the SSML markup language [[!SPEECH-SYNTHESIS]]. Also note that possible values are - described in a <a href="http://www.w3.org/TR/ssml-sayas">W3C note</a> separate from the SSML - specification, whereas the CSS Speech Module explicitly defines a list of possible values. </p> + from the SSML markup language [[!SSML]]. Also note that possible values are described in a W3C + Note ([[SSML-SAYAS]]) separate from the SSML specification, whereas the CSS Speech module + explicitly defines a list of possible values. </p> <dl> <dt> <strong>normal</strong> </dt> <dd> <p>Uses language-dependent pronunciation rules for rendering an element and its children. - Punctuation is not to be spoken, but instead rendered naturally as various pauses.</p> + For example, punctuation is not spoken as-is, but instead rendered naturally as + appropriate pauses.</p> </dd> <dt> <strong>spell-out</strong> @@ -779,7 +765,7 @@ cue within the <a href="#aural-model">audio "box" model</a>.</p> <p class="note"> Note that the functionality provided by this property is related to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_break"><code>break</code> element</a> - from the SSML markup language [[!SPEECH-SYNTHESIS]]. </p> + from the SSML markup language [[!SSML]]. </p> <dl> <dt> <strong><time></strong> @@ -787,7 +773,7 @@ <dd> <p>Expresses the pause in absolute time units (seconds and milliseconds, e.g. "+3s", "250ms") as per the syntax of <a href="#time-def">time</a> values defined in [[!CSS3VAL]]. - Only positive values are allowed.</p> + Only non-negative values are allowed.</p> </dd> <dt> <strong>none</strong> @@ -1015,7 +1001,7 @@ within the <a href="#aural-model">audio "box" model</a>. </p> <p class="note"> Note that the functionality provided by this property is related to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_break"><code>break</code> element</a> - from the SSML markup language [[!SPEECH-SYNTHESIS]]. </p> + from the SSML markup language [[!SSML]]. </p> <dl> <dt> <strong><time></strong> @@ -1023,7 +1009,7 @@ <dd> <p>Expresses the rest in absolute time units (seconds and milliseconds, e.g. "+3s", "250ms") as per the syntax of <a href="#time-def">time</a> values defined in [[!CSS3VAL]]. Only - positive values are allowed.</p> + non-negative values are allowed.</p> </dd> <dt> <strong>none</strong> @@ -1115,7 +1101,7 @@ <td> <em>Value:</em> </td> - <td><uri> [<percentage> | silent]? | none | inherit</td> + <td><uri> <decibel>? | none | inherit</td> </tr> <tr> <td> @@ -1168,7 +1154,7 @@ <td> <em>Value:</em> </td> - <td><uri> [<percentage> | silent]? | none | inherit</td> + <td><uri> <decibel>? | none | inherit</td> </tr> <tr> <td> @@ -1211,19 +1197,19 @@ <p>The 'cue-before' and 'cue-after' properties specify auditory icons (i.e. prerecorded audio clips) to be played before (or after) the selected element within the <a href="#aural-model" >audio "box" model</a>. When a user agent is not able to render the specified auditory icon, - it is recommended to produce an alternative cue (e.g., popping up a warning, emitting a - warning sound, etc.)</p> + it is recommended to produce an alternative cue. (e.g. popping up a warning, emitting a + warning sound)</p> <p class="note"> Note that the functionality provided by this property is related to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_audio"><code>audio</code> element</a> - from the SSML markup language [[!SPEECH-SYNTHESIS]]. </p> + from the SSML markup language [[!SSML]]. </p> <dl> <dt> <strong><uri></strong> </dt> <dd> - <p>The URI must designate an auditory icon resource. If the URI resolves to something other - than an audio file, such as an image, the resource is ignored and the property treated as - if it had the value 'none'.</p> + <p>The URI designates an auditory icon resource. If the URI resolves to something other than + an audio file, such as an image, the resource is ignored and the property treated as if it + had the value 'none'.</p> </dd> <dt> <strong>none</strong> @@ -1232,48 +1218,41 @@ <p>Specifies that no auditory icon is used.</p> </dd> <dt> - <strong><percentage></strong> - </dt> - <dd> - <p> The loudness of prerecorded audio cues can be adjusted relatively to the volume level of - synthetic speech. The default value is '100%'. Only positive <a href="#percentage-def" - >percentage</a> values are allowed. If the inherited value of the 'voice-volume' - property is 'silent', this percentage value has no effect and the volume level for the - audio cue is resolved to 'silent'. Otherwise, computed values are calculated relative to - the inherited value of the 'voice-volume' property, and are then clipped to the range '0' - to '100'. Refer to the 'voice-volume' property for the meaning of the numerical scale from - '0' to '100'.</p> - <p class="note"> Note that a leading "+" sign does not denote an increment. For example, - +50% is equivalent to 50%, so the computed value equals the inherited value times 0.5 - (divided by 2), then clipped to [0,100]. </p> - </dd> - <dt> - <strong>silent</strong> + <strong><decibel></strong> </dt> <dd> - <p> Specifies that the volume level results in no sound output at all. </p> + <p> The loudness of prerecorded audio cues can be adjusted relative to the volume level of + synthetic speech, inherited value from the 'voice-volume' property. This value is an + integer or floating point <a href="#number-def">number</a> immediately followed by "dB" + (decibel unit). The default value is '+0.0dB' (no change). If the inherited value of the + 'voice-volume' property is 'silent', the provided value has no effect and the volume level + for the audio cue is resolved to 'silent'. Decibels are an expression of the ratio of the + squares of the new signal amplitude (a1) and the current amplitude (a0), as per the + following logarithmic equation: volume(dB) = 20 log10 (a1 / a0) </p> + <p class="note"> Note that -6.0dB is approximately half the amplitude of the audio signal, + and +6.0dB is approximately twice the amplitude.</p> <p class="note"> Note that there is a difference between an audio cue whose volume is set to 'silent' and one whose value is 'none'. In the former case, the audio cue takes up the same time as if it had been played, but no sound is generated. In the latter case, the - there is no manifestation of the audio cue at all (i.e. no time is allocated in the aural - dimension for the cue). </p> + there is no manifestation of the audio cue at all (i.e. no time is allocated for the cue + in the aural dimension). </p> </dd> </dl> <div class="example"> <pre> a { - cue-before: url(/audio/bell.aiff); + cue-before: url(/audio/bell.aiff) -3dB; cue-after: url(dong.wav); } h1 { - cue-before: url(../clips-1/pop.au) 80%; - cue-after: url(../clips-2/pop.au) 50%; + cue-before: url(../clips-1/pop.au) +6dB; + cue-after: url(../clips-2/pop.au) 6dB; } -div.caution { cue-before: url(./audio/caution.wav) 130%; } +div.caution { cue-before: url(./audio/caution.wav) +8dB; } </pre> </div> <h3 id="cue-props-cue">The 'cue' shorthand property</h3> @@ -1403,26 +1382,28 @@ </tbody> </table> <p>The 'voice-family' property specifies a comma-separated, prioritized list of values that - designate speech synthesis voices. (analog to 'font-family' in visual style sheets).</p> + designate speech synthesis voices (analogous to 'font-family' in visual style sheets).</p> <p class="note"> Note that the functionality provided by this property is related to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_voice"><code>voice</code> element</a> - from the SSML markup language [[!SPEECH-SYNTHESIS]]. </p> + from the SSML markup language [[!SSML]]. </p> <dl> <dt> <strong><name></strong> </dt> <dd> - <p>Values are specific instances (e.g., Mike, comedian, mary, carlos, "valley-girl"). Values - can be quoted, and indeed must be quoted if any of the words that make up the name does - not conform to the syntax rules for <a href="#identifier-def">identifiers</a>. Any - whitespace characters before and after the voice name are ignored. For compatibility with - SSML, whitespace characters are not permitted within voice names. </p> + <p>Values are specific voice instances (e.g., Mike, comedian, mary, carlos, "valley-girl"). + For compatibility with SSML, voice names do not contain whitespace characters, and any + whitespace characters before or after the voice name are ignored. Values can be quoted, + and indeed must be quoted if any of the words that make up the name does not conform to + the syntax rules for <a href="#identifier-def">identifiers</a>. </p> </dd> <dt> <strong><age></strong> </dt> <dd> - <p>Possible values are 'child', 'young' and 'old'.</p> + <p> Possible values are <a href="#non-negative-number-def">non-negative numbers</a> + restricted to positive integers (i.e. excluding zero), indicating the preferred age in + years (since birth) of the voice.</p> </dd> <dt> <strong><gender></strong> @@ -1435,7 +1416,7 @@ </dt> <dd> <p>Indicates a preferred variant (e.g. "the second male child voice" or "the third voice - named 'Mike'"). Possible values are <a href="#non-negative-number-def">positive + named 'Mike'"). Possible values are <a href="#non-negative-number-def">non-negative numbers</a> restricted to integers, and excluding zero (i.e. starting from 1). The value "1" refers to the first of all matching voices. </p> </dd> @@ -1449,7 +1430,7 @@ applied to the root content element (defaults to the user or user-agent stylesheet). </p> <p class="note"> Note that descendants of the selected element automatically inherit the 'preserve' value, unless it is explicitly overridden by other 'voice-family' values (e.g. - name, gender, age, etc.). </p> + name, gender, age). </p> </dd> </dl> <h4 class="no-toc" id="voice-props-lang-handling">Voice selection, content language</h4> @@ -1460,38 +1441,46 @@ 'voice-family' property value gets inherited by descendant elements. At any point within the content structure, the language takes precedence (i.e. has a higher priority) over the specified CSS voice characteristics. The following list outlines the selection algorithm (note - that the definition of "language" is loose here, in order to cater for dialectic variants): - .</p> + that the definition of "language" is loose here, in order to cater for dialectic + variants):</p> <ol> <li> If only a single voice is available for the language of the selected content, then this voice must be used, regardless of the specified CSS voice characteristics. </li> <li> If several voices are available for the language of the selected content, then the chosen voice is the one that most closely matches the specified gender, age, and preferred voice index. The actual definition of "best match" is processor-dependent.</li> - <li> If no voice is available for the language of the selected content, user-agent should - raise a warning to let the user know about the lack of appropriate TTS voice. </li> + <li> If no voice is available for the language of the selected content, it is recommended that + user-agents let the user know about the lack of appropriate TTS voice. </li> <li>The speech synthesizer voice must be re-evaluated (i.e. the selection process must take - place once again) whenever either of the CSS voice characteristics change within the content + place once again) whenever any of the CSS voice characteristics change within the content flow. The voice must also be re-calculated whenever the content language changes, unless the 'preserve' keyword is used (this may be useful in cases where embedded foreign language text can be spoken using a voice not designed for this language, as demonstrated by the example - below). </li> + below). <p class="note">Note that dynamically computing a voice may lead to unexpected lag, + so user-agents should try to resolve concrete voice instances in the document tree before + the playback starts. </p> + </li> </ol> <p>Here are a few examples:</p> <div class="example"> <pre> -h1 { voice-family: announcer, old male; } -p.romeo { voice-family: romeo 3, young male; } -p.juliet { voice-family: juliet, female; } -p.mercutio { voice-family: male 2; } -p.tybalt { voice-family: male 3; } -p.nurse { voice-family: child female; } +h1 { voice-family: announcer, 65 male; } +p.romeo { voice-family: romeo, 18 male; } +p.juliet { voice-family: juliet, 19 female; } +p.mercutio { voice-family: 26 male; } +p.tybalt { voice-family: 30 male; } +p.nurse { voice-family: amelie; } ... <p class="romeo" xml:lang="en-US"> - The french text below will be spoken with an english voice: + The French text below will be spoken with an English voice: <span style="voice-family: preserve;" xml:lang="fr-FR">Bonjour monsieur !</span> + + The English text below will be spoken with a voice different + than that corresponding to the class "romeo" + (which is inherited from the "p" parent element): + <span style="voice-family: female;">Hello sir!</span> </p> </pre> </div> @@ -1549,24 +1538,24 @@ </tbody> </table> <p>The 'voice-rate' property manipulates the rate of generated synthetic speech in terms of - words per minute. The default rate for a given 'voice-family' is processor-specific, and - depends on the language, dialect and on the "personality" of the voice.</p> + words per minute.</p> <p class="note"> Note that the functionality provided by this property is related to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>rate</code> attribute of - the <code>prosody</code> element</a> from the SSML markup language [[!SPEECH-SYNTHESIS]]. </p> + the <code>prosody</code> element</a> from the SSML markup language [[!SSML]]. </p> <dl> <dt> <strong>normal</strong> </dt> <dd> <p>Represents the default rate produced by the speech synthesizer for the currently active - voice.</p> + voice. This is processor-specific and depends on the language, dialect and on the + "personality" of the voice. </p> </dd> <dt> <strong><percentage></strong> </dt> <dd> - <p>Only positive <a href="#percentage-def">percentage</a> values are allowed. Computed + <p>Only non-negative <a href="#percentage-def">percentage</a> values are allowed. Computed values are calculated relative to the default speaking rate for the voice (the "normal" computed value).</p> <p class="note"> Note that a leading "+" sign does not denote an increment, for example +50% @@ -1595,8 +1584,8 @@ <td> <em>Value:</em> </td> - <td><frequency> | <percentage> | <relative-change> | x-low | low | - medium | high | x-high | inherit</td> + <td><frequency> | <percentage> | <relative-value> && relative | + x-low | low | medium | high | x-high | inherit</td> </tr> <tr> <td> @@ -1641,7 +1630,7 @@ is around 120Hz, whereas it is around 210Hz for a female voice.</p> <p class="note"> Note that the functionality provided by this property is related to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>pitch</code> attribute of - the <code>prosody</code> element</a> from the SSML markup language [[!SPEECH-SYNTHESIS]]. </p> + the <code>prosody</code> element</a> from the SSML markup language [[!SSML]]. </p> <dl> <dt> <strong><frequency></strong> @@ -1656,25 +1645,30 @@ <strong><percentage></strong> </dt> <dd> - <p> Only positive <a href="#percentage-def">percentage</a> values are allowed. Computed + <p> Only non-negative <a href="#percentage-def">percentage</a> values are allowed. Computed values are calculated relative to the inherited value. </p> <p class="note"> Note that a leading "+" sign does not denote an increment. For example, +50% is equivalent to 50%, so the computed value equals the inherited value times 0.5 - (divided by 2), which is half the inherited average pitch of the voice. </p> + (i.e. divided by 2), which is half the inherited average pitch of the voice. </p> </dd> <dt> - <strong><relative-change></strong> + <strong><relative-value></strong> </dt> <dd> <p> Specifies a relative change (decrement or increment) to the inherited value. The syntax - of allowed values is a <<a href="#number-def">number</a>> (the "+" sign is optional - for positive numbers), followed by either of "Hz" (for Hertz) or "kHz" (for kiloHertz) or - "st" (for semitones), and followed by a space character and the "relative" keyword.</p> - <p class="note"> Note that here, unlike the syntax of <a href="#frequency-def">frequency</a> - values defined in [[!CSS3VAL]], the number can be positive or negative. </p> - <p class="note"> Note that the "relative" keyword is mandatory. This is in order to - disambiguate from <frequency> values which may also carry the optional "+" sign on - positive values. </p> + of allowed values is a <<a href="#number-def">number</a>>, followed immediately by + either of "Hz" (for Hertz) or "kHz" (for kiloHertz) or "st" (for semitones).</p> + <p class="note"> Note that unlike with the syntax of <a href="#frequency-def">frequency</a> + values defined in [[!CSS3VAL]], here the provided number can be positive or negative. The + 'relative' keyword must be used to disambiguate absolute frequency values (e.g. "+10Hz" + versus "+10Hz relative") </p> + </dd> + <dt> + <strong>relative</strong> + </dt> + <dd> + <p> This keyword specifies that the provided value is expressed relatively to another base + value. This is in order to disambiguate from absolute <frequency> values. </p> </dd> <dt><strong>x-low</strong>, <strong>low</strong>, <strong>medium</strong>, <strong>high</strong>, <strong>x-high</strong></dt> @@ -1689,7 +1683,7 @@ h1 { voice-pitch: +250Hz; } /* identical to the line above */ h2 { voice-pitch: +30Hz relative; } h2 { voice-pitch: 30Hz relative; } /* identical to the line above */ -h3 { voice-pitch: -2st relative; } +h3 { voice-pitch: relative -2st; } /* the swapped keyword placement is a legal syntax */ h4 { voice-pitch: -2st; } /* Illegal syntax ! ("relative" keyword is missing) */ </pre> </div> @@ -1706,8 +1700,8 @@ <td> <em>Value:</em> </td> - <td><frequency> | <percentage> | <relative-change> | x-low | low | - medium | high | x-high | inherit</td> + <td><frequency> | <percentage> | <relative-value> && relative | + x-low | low | medium | high | x-high | inherit</td> </tr> <tr> <td> @@ -1753,7 +1747,7 @@ variations in inflection are used to convey meaning and emphasis in speech. </p> <p class="note"> Note that the functionality provided by this property is related to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>range</code> attribute of - the <code>prosody</code> element</a> from the SSML markup language [[!SPEECH-SYNTHESIS]]. </p> + the <code>prosody</code> element</a> from the SSML markup language [[!SSML]]. </p> <dl> <dt> <strong><frequency></strong> @@ -1770,23 +1764,30 @@ <strong><percentage></strong> </dt> <dd> - <p> Only positive <a href="#percentage-def">percentage</a> values are allowed. Computed + <p> Only non-negative <a href="#percentage-def">percentage</a> values are allowed. Computed values are calculated relative to the inherited value.</p> <p class="note"> Note that a leading "+" sign does not denote an increment. For example, +50% is equivalent to 50%, so the computed value equals the inherited value times 0.5 - (divided by 2), which is half the inherited average pitch range of the voice. </p> + (i.e. divided by 2), which is half the inherited average pitch range of the voice. </p> </dd> <dt> - <strong><relative-change></strong> + <strong><relative-value></strong> </dt> <dd> - <p> Specifies a relative change (decrement or increment) to the inherited value. The syntax - of allowed values is a <<a href="#number-def">number</a>> (note that the "+" sign is - optional for positive numbers), followed by either of "Hz" (for Hertz) or "st" (for - semitones), and followed by a space character and the "relative" keyword.</p> - <p class="note"> Note that the "relative" keyword is mandatory. This is in order to - disambiguate from <frequency> values which may also carry the optional "+" sign on - positive values. </p> + <p> Specifies a change (decrement or increment) relative to the inherited value. The syntax + of allowed values is a <<a href="#number-def">number</a>>, immediately followed by + either of "Hz" (for Hertz) or "st" (for semitones).</p> + <p class="note"> Note that unlike with the syntax of <a href="#frequency-def">frequency</a> + values defined in [[!CSS3VAL]], here the provided number can be positive or negative. The + 'relative' keyword must be used to disambiguate absolute frequency values (e.g. "+10Hz" + versus "+10Hz relative") </p> + </dd> + <dt> + <strong>relative</strong> + </dt> + <dd> + <p> This keyword specifies that the provided value is expressed relatively to another base + value. This is in order to disambiguate from absolute <frequency> values. </p> </dd> <dt><strong>x-low</strong>, <strong>low</strong>, <strong>medium</strong>, <strong>high</strong> and <strong>x-high</strong></dt> @@ -1856,7 +1857,7 @@ The precise meaning of the values therefore depend on the language being spoken. </p> <p class="note"> Note that the functionality provided by this property is related to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_emphasis"><code>emphasis</code> - element</a> from the SSML markup language [[!SPEECH-SYNTHESIS]]. </p> + element</a> from the SSML markup language [[!SSML]]. </p> <dl> <dt> <strong>normal</strong> @@ -1968,7 +1969,7 @@ determine the speaking rate of the voice. </p> <p class="note"> Note that the functionality provided by this property is related to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>duration</code> attribute - of the <code>prosody</code> element</a> from the SSML markup language [[!SPEECH-SYNTHESIS]]. </p> + of the <code>prosody</code> element</a> from the SSML markup language [[!SSML]]. </p> <dl> <dt> <strong>auto</strong> @@ -1983,7 +1984,7 @@ <dd> <p> Specifies a value in absolute time units (seconds and milliseconds, e.g. "+3s", "250ms") as per the syntax of <a href="#time-def">time</a> values defined in [[!CSS3VAL]]. Only - positive values are allowed. </p> + non-negative values are allowed. </p> </dd> </dl> <h2 id="lists">Support for list item styles</h2> @@ -2004,17 +2005,17 @@ authored within aural CSS stylesheets would have needed to be updated each time text changed within the markup document). The "phonemes" functionality is therefore considered out-of-scope in CSS (the presentation layer) and should be addressed in the markup / content layer.</p> - <p> The W3C <a href="http://www.w3.org/TR/pronunciation-lexicon">PLS (Pronunciation Lexicon - Specification)</a> recommendation is one potential format to use with the <a + <p> The W3C PLS (Pronunciation Lexicon Specification) recommendation ([[PRONUNCIATION-LEXICON]]) + is one potential format to use with the <a href="http://microformats.org/wiki/rel-pronunciation"></a>"pronunciation" <code>rel</code> value, which allows importing pronunciation lexicons in HTML documents using the <code>link</code> element (similarly to how CSS stylesheets can be included). </p> <p> Additionally, an attribute-based mechanism can be used within the markup to author text-pronunciation associations. At the time of writing, such mechanism isn't formally defined in the W3C HTML standard(s). However, the <a href="http://idpf.org/epub/30">EPUB 3.0 draft - specification</a> allows (x)HTML5 documents to contain SSML-derived attributes (<a - href="http://www.w3.org/TR/speech-synthesis11">Speech Synthesis Markup Language</a>) that - describe how to pronounce text based on a particular phonetic alphabet.</p> + specification</a> allows (x)HTML5 documents to contain attributes derived from the [[!SSML]] + specification, that describe how to pronounce text based on a particular phonetic + alphabet.</p> <!-- p> One avenue to explore is the use CSS to "bind" HTML text with a phoneme (also declared in the HTML document). This would maintain a @@ -2032,14 +2033,14 @@ Incubator Groups. </p --> <h2 id="content">Inserted and replaced content</h2> - <p class="note">Note that this entire section is non-normative.</p> + <!-- p class="note">Note that this entire section is non-normative.</p --> <p>Sometimes, authors will want to specify a mapping from the source text into another string prior to the application of the regular pronunciation rules. This may be used for uncommon abbreviations or acronyms which are unlikely to be recognized by the synthesizer. The 'content' property can be used to replace one string by another. </p> <p class="note"> Note that the functionality provided by this property is related to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_sub"><code>alias</code> attribute of the - <code>sub</code> element</a> from the SSML markup language [[!SPEECH-SYNTHESIS]]. </p> + <code>sub</code> element</a> from the SSML markup language [[!SSML]]. </p> <p> In the following example, the abbreviation is rendered using the content of the title attribute instead of the element's content:</p> <div class="example"> @@ -2070,7 +2071,7 @@ <p>Furthermore, authors (or users via a user stylesheet) may add some information to ease the understanding of structures during non-visual interaction with the document. They can do so by using the '::before' and '::after' pseudo-elements that will be inserted between the element's - contents and the 'rest' Note that different stylesheets can be used to define the level of + contents and the 'rest'. Note that different stylesheets can be used to define the level of verbosity for additional information spoken by screen readers. .</p> <p>The following example inserts the string "Start list: " before a list and the string "List item: " before the content of each list item. Likewise, the string "List end: " gets inserted @@ -2082,7 +2083,7 @@ li::before { content: "List item: "; } </pre> </div> - <p>Detailed information can be found in the CSS3 Generated and Replaced Content Module + <p>Detailed information can be found in the CSS3 Generated and Replaced Content module [[CSS3GENCON]].</p> <hr title="Separator from footer" /> <h2 class="no-num" id="property-index">Appendix A — Property index</h2> @@ -2191,7 +2192,7 @@ <p>Informative notes begin with the word "Note" and are set apart from the normative text with <code>class="note"</code>, like this:</p> <p class="note">Note, this is an informative note.</p> - <p>Conformance to the CSS3 Speech Module is defined for three classes:</p> + <p>Conformance to the CSS3 Speech module is defined for three classes:</p> <dl> <dt> <dfn title="style sheet!!as conformance class">style sheet</dfn> @@ -2207,16 +2208,16 @@ </dt> <dd>A UA that writes a style sheet.</dd> </dl> - <p>A style sheet is conformant to the CSS3 Speech Module if all of its declarations that use + <p>A style sheet is conformant to the CSS3 Speech module if all of its declarations that use properties defined in this module have values that are valid according to the generic CSS grammar and the individual grammars of each property as given in this module. </p> - <p>A renderer is conformant to the CSS3 Speech Module if, in addition to interpreting the style + <p>A renderer is conformant to the CSS3 Speech module if, in addition to interpreting the style sheet as defined by the appropriate specifications, it supports all the properties defined by - CSS3 Speech Module by parsing them correctly and rendering the document accordingly. However + CSS3 Speech module by parsing them correctly and rendering the document accordingly. However the inability of a UA to correctly render a document due to limitations of the device does not make the UA non-conformant. (For example, a UA is not required to render color on a monochrome monitor.) </p> - <p>An authoring tool is conformant to CSS3 Speech Module if it writes syntactically correct + <p>An authoring tool is conformant to CSS3 Speech module if it writes syntactically correct style sheets, according to the generic CSS grammar and the individual grammars of each property in this module. </p> <!-- h3 class="no-num" id="levels">Levels</h3> @@ -2256,7 +2257,7 @@ <h4 class="no-num" id="level-3">CSS Level 3</h4> <ul> - <li>All features described in the CSS3 Speech Module + <li>All features described in the CSS3 Speech module </ul --> <h3 class="no-num" id="exit">CR exit criteria</h3> <p>As described in the W3C process document, a <a @@ -2311,8 +2312,7 @@ <li>Adjusted the [initial] value for shorthand properties, to be consistent with other CSS specifications (i.e. "see individual properties"), and removed the erroneous "inherit" value.</li> - <li>Fixed numerical volume scale (and the associated "named" values) by adding the 'linear' - keyword. Also added the 'silent' value to audio cues.</li> + <li>Fixed 'voice-volume' by conforming to SSML 1.1 (dB scale, etc.).</li> <li>Fixed the [initial] values for 'pause' and 'rest', which should be zero (were "implementation-dependent").</li> <li>Corrected the [initial] values for 'voice-pitch-range' and 'voice-pitch' to "medium".</li> @@ -2325,8 +2325,11 @@ <li>Added the missing "Computed value" line to each property definition.</li> <li>Cleaned-up the list of module dependencies, and removed redundant "module dependencies" section.</li> + <li> Voice age now expressed using integers rather than a keyword enumeration ('child', + 'young' and 'old'). This aligns with SSML. </li> <li>Improved the pause collapsing prose, removed redundant paragraphs.</li> <li>Added the missing 'normal' value for 'voice-stress'.</li> + <li>Separated the 'relative' keyword for 'voice-pitch' and 'voice-range'.</li> <li>Improved document structure by adding sub-sections.</li> <li>Fixed typos and made other minor edits.</li> </ul> @@ -2339,7 +2342,7 @@ <li>The volume level of audio cues can only be set relatively to the inherited 'voice-volume' property (to avoid cues being spoken when the main element is silent, which contradicts the "aural box model").</li> <li>Added "HTML" to "CSS defines aural properties that give control over rendering XML to speech" in the abstract.</li> - <li>Removed unused normative links to CSS3 Modules (actually moved to informative references), now the only dependency is CSS3 Values and Units.</li> + <li>Removed unused normative links to CSS3 modules (actually moved to informative references), now the only dependency is CSS3 Values and Units.</li> <li>Removed issue about the 'sub' SSML element given that the CSS "content" replacement functionality addresses the same requirement.</li> <li>Added support for semitones in pitch alterations.</li> <li>Added reference to "time" values syntax (s, ms) for 'voice-duration'.</li>
Received on Thursday, 26 May 2011 13:43:08 UTC