csswg/css3-speech Overview.html,1.85,1.86 Overview.src.html,1.86,1.87

Update of /sources/public/csswg/css3-speech
In directory hutz:/tmp/cvs-serv29731

Modified Files:
	Overview.html Overview.src.html 
Log Message:
fixed pitch computed value definition (updated cascade example), fixed voice-family age keywords and added a note about voice-matching algorithm (processor-dependent precision)



Index: Overview.html
===================================================================
RCS file: /sources/public/csswg/css3-speech/Overview.html,v
retrieving revision 1.85
retrieving revision 1.86
diff -u -d -r1.85 -r1.86
--- Overview.html	14 Jul 2011 18:54:37 -0000	1.85
+++ Overview.html	1 Aug 2011 17:45:43 -0000	1.86
@@ -90,13 +90,13 @@
 
    <h1 id=top>CSS Speech Module</h1>
 
-   <h2 class="no-num no-toc" id=longstatus-date>Editor's Draft 14 July 2011</h2>
+   <h2 class="no-num no-toc" id=longstatus-date>Editor's Draft 01 August 2011</h2>
 
    <dl>
     <dt>This version:
 
     <dd>
-     <!--<a href="http://www.w3.org/TR/2011/WD-css3-speech-20110714">http://www.w3.org/TR/2011/ED-css3-speech-20110714/</a>-->
+     <!--<a href="http://www.w3.org/TR/2011/WD-css3-speech-20110801">http://www.w3.org/TR/2011/ED-css3-speech-20110801/</a>-->
      <a
      href="http://dev.w3.org/csswg/css3-speech">http://dev.w3.org/csswg/css3-speech</a>
      
@@ -345,10 +345,12 @@
      <li class=no-num><a href="#exit">CR exit criteria</a>
     </ul>
 
-   <li class=no-num><a href="#changes">Appendix D &mdash; Changes from
+   <li class=no-num><a href="#ack">Appendix D &mdash; Acknowledgements</a>
+
+   <li class=no-num><a href="#changes">Appendix E &mdash; Changes from
     previous draft</a>
 
-   <li class=no-num><a href="#references">Appendix E &mdash; References</a>
+   <li class=no-num><a href="#references">Appendix F &mdash; References</a>
     <ul class=toc>
      <li class=no-num><a href="#normative-references">Normative
       references</a>
@@ -1893,9 +1895,26 @@
    <dt> <strong>&lt;age&gt;</strong>
 
    <dd>
-    <p> An <a href="#integer-def">integer</a> indicating the preferred age in
-     years (since birth) of the voice. Only positive integers (i.e. excluding
-     zero) are allowed.</p>
+    <p> Possible values are &lsquo;<code class=property>child</code>&rsquo;,
+     &lsquo;<code class=property>young</code>&rsquo; and &lsquo;<code
+     class=property>old</code>&rsquo;, indicating the preferred age category
+     to match during voice selection. The mapping with <a href="#SSML"
+     rel=biblioentry>[SSML]<!--{{!SSML}}--></a> ages is defined as follows:
+     &lsquo;<code class=property>child</code>&rsquo; = 6 y/o, &lsquo;<code
+     class=property>young</code>&rsquo; = 24 y/o, &lsquo;<code
+     class=property>old</code>&rsquo; = 75 y/o (note that more flexible age
+     ranges may be used by the processor-dependent voice-matching algorithm).
+     </p>
+
+    <p class=note> The interpretation of the relationship between a person's
+     age and a recognizable type of voice cannot realistically be defined in
+     a universal manner, as it effectively depends on numerous cultural and
+     linguistic variations. The values provided by this specification
+     therefore represent a simplified model that can be reasonably applied to
+     a great variety of speech locales, albeit at the cost of a certain
+     degree of approximation. Future versions of this specification may
+     refine the level of precision of the voice-matching algorithm, as speech
+     processor implementations become more standardized.</p>
 
    <dt> <strong>&lt;gender&gt;</strong>
 
@@ -1961,7 +1980,7 @@
 
   <p> The following list outlines the voice selection algorithm (note that
    the definition of "language" is loose here, in order to cater for
-   dialectic variants):
+   dialectic variations):
 
   <ol>
    <li> If only a single voice instance is available for the language of the
@@ -1971,10 +1990,11 @@
    <li> If several voice instances are available for the language of the
     selected content, then the chosen voice is the one that most closely
     matches the specified name, or gender, age, and preferred voice variant.
-    The actual definition of "best match" is processor-dependent (e.g. a
-    reasonable match for "voice-family: 10 male;" may well be a
-    higher-pitched female voice, as this tone of voice may be close to that
-    of a young boy). If no voice instance matches the characteristics
+    The actual definition of "best match" is processor-dependent. For
+    example, in a system that only has male and female adult voices
+    available, a reasonable match for "voice-family: young male" may well be
+    a higher-pitched female voice, as this tone of voice would be close to
+    that of a young boy. If no voice instance matches the characteristics
     provided by any of the &lsquo;<a href="#voice-family"><code
     class=property>voice-family</code></a>&rsquo; component values, the first
     available voice instance (amongst those suitable for the language of the
@@ -2002,11 +2022,11 @@
    <p>Examples of property values:</p>
 
    <pre>
-h1 { voice-family: announcer, 65 male; }
-p.romeo  { voice-family: romeo, 18 male; }
-p.juliet { voice-family: juliet, 19 female; }
-p.mercutio { voice-family: 26 male; }
-p.tybalt { voice-family: 30 male; }
+h1 { voice-family: announcer, old male; }
+p.romeo  { voice-family: romeo, young male; }
+p.juliet { voice-family: juliet, young female; }
+p.mercutio { voice-family: young male; }
+p.tybalt { voice-family: young male; }
 p.nurse { voice-family: amelie; }
 
 ...
@@ -2198,9 +2218,10 @@
     <tr>
      <td> <em>Computed value:</em>
 
-     <td>an absolute frequency, or a keyword value and potentially also a
-      frequency, semitone, and/or percentage representing any non-zero
-      offsets (relative to the keyword)
+     <td> one of the predefined keywords if only the keyword is specified by
+      itself, otherwise a fixed frequency calculated by converting the
+      keyword value (if any) to an absolute value based on the current
+      voice-family and by applying the specified relative offset (if any)
   </table>
 
   <p>The &lsquo;<a href="#voice-pitch"><code
@@ -2250,10 +2271,11 @@
      be quantified as the difference between two consecutive pitch
      frequencies on such scale. The ratio between two consecutive frequencies
      separated by exactly one semitone is the twelfth root of two
-     (approximately 1.05946). As a result, the value in Hertz corresponding
-     to a semitone offset is relative to the initial frequency the offset is
-     applied to (in other words, a semitone doesn't correspond to a fixed
-     numerical value in Hertz).</p>
+     (approximately 11011/10393, which equals exactly 1.0594631). As a
+     result, the value in Hertz corresponding to a semitone offset is
+     relative to the initial frequency the offset is applied to (in other
+     words, a semitone doesn't correspond to a fixed numerical value in
+     Hertz).</p>
 
    <dt> <strong>&lt;percentage&gt;</strong>
 
@@ -2272,7 +2294,16 @@
 
    <dd>
     <p>A sequence of monotonically non-decreasing pitch levels that are
-     implementation and voice specific.</p>
+     implementation and voice specific. When the computed value for a given
+     element is only a keyword (i.e. no relative offset is specified), then
+     the corresponding absolute frequency will be re-evaluated on a voice
+     change. Conversely, the application of a relative offset requires the
+     calculation of the resulting frequency based on the current voice at the
+     point at which the relative offset is specified, so the computed
+     frequency will inherit absolutely regardless of any voice change further
+     down the style cascade. Authors should therefore only use keyword values
+     in cases where they wish that voice changes trigger the re-evaluation of
+     the conversion from a keyword to a concrete, voice-dependent frequency.</p>
   </dl>
 
   <p> Computed absolute frequency values that are negative are clamped to
@@ -2346,9 +2377,10 @@
     <tr>
      <td> <em>Computed value:</em>
 
-     <td>an absolute frequency, or a keyword value and potentially also a
-      frequency, semitone, and/or percentage representing any non-zero
-      offsets (relative to the keyword)
+     <td> one of the predefined keywords if only the keyword is specified by
+      itself, otherwise a fixed frequency calculated by converting the
+      keyword value (if any) to an absolute value based on the current
+      voice-family and by applying the specified relative offset (if any)
   </table>
 
   <p> The &lsquo;<a href="#voice-range"><code
@@ -2398,10 +2430,11 @@
      be quantified as the difference between two consecutive pitch
      frequencies on such scale. The ratio between two consecutive frequencies
      separated by exactly one semitone is the twelfth root of two
-     (approximately 1.05946). As a result, the value in Hertz corresponding
-     to a semitone offset is relative to the initial frequency the offset is
-     applied to (in other words, a semitone doesn't correspond to a fixed
-     numerical value in Hertz).</p>
+     (approximately 11011/10393, which equals exactly 1.0594631). As a
+     result, the value in Hertz corresponding to a semitone offset is
+     relative to the initial frequency the offset is applied to (in other
+     words, a semitone doesn't correspond to a fixed numerical value in
+     Hertz).</p>
 
    <dt> <strong>&lt;percentage&gt;</strong>
 
@@ -2420,7 +2453,16 @@
 
    <dd>
     <p>A sequence of monotonically non-decreasing pitch levels that are
-     implementation and voice specific.</p>
+     implementation and voice specific. When the computed value for a given
+     element is only a keyword (i.e. no relative offset is specified), then
+     the corresponding absolute frequency will be re-evaluated on a voice
+     change. Conversely, the application of a relative offset requires the
+     calculation of the resulting frequency based on the current voice at the
+     point at which the relative offset is specified, so the computed
+     frequency will inherit absolutely regardless of any voice change further
+     down the style cascade. Authors should therefore only use keyword values
+     in cases where they wish that voice changes trigger the re-evaluation of
+     the conversion from a keyword to a concrete, voice-dependent frequency.</p>
   </dl>
 
   <p> Computed absolute frequency values that are negative are clamped to
@@ -2457,32 +2499,33 @@
 
 body { voice-range: inherit; } /* the initial value is 'medium'
                                (the actual frequency value
-                               depends on the active voice) */
+                               depends on the current voice) */
 
 e1 { voice-range: +25%; } /* the computed value is
-                          ['medium' + 25%], which will resolve
+                          ['medium' + 25%] which resolves
                           to the frequency corresponding to 'medium'
                           plus 0.25 times the frequency
                           corresponding to 'medium' */
 
 e2 { voice-range: +10Hz; } /* the computed value is
-                          ['medium' + 25% + 10Hz], which will resolve
-                          to the frequency corresponding to 'medium'
-                          plus 0.25 times the frequency
-                          corresponding to 'medium',
-                          plus another 10 Hertz*/
+                          [FREQ + 10Hz] where "FREQ" is the absolute frequency
+                          calculated in the "e1" rule above.
+                          */
                           
 e3 { voice-range: inherit; /* this could be omitted,
                            but we explicitly specify it for clarity purposes */
                            
-     voice-family: "another-voice"; } /* the computed value is the same as
-                              for "e2", but here the voice is different,
-                              so once calculated, the used absolute frequency
-                              may be completely different
-                              due to voice-dependent discrepancies */
+     voice-family: "another-voice"; } /* this voice change would have resulted in
+                              the re-evaluation of the initial 'medium' keyword
+                              inherited by the "body" element
+                              (i.e. conversion from a voice-dependent keyword value
+                              to a concrete, absolute frequency),
+                              but because relative offsets were applied down the style
+                              cascade, the inherited value is actually the frequency
+                              calculated at the "e2" rule above. */
 
 e4 { voice-range: 200Hz absolute; } /* override with an absolute frequency
-                                    which doesn't depend on the active voice */
+                                    which doesn't depend on the current voice */
 
 e5 { voice-range: 2st; } /* the computed value is an absolute frequency,
                          which is the result of the
@@ -2493,9 +2536,10 @@
 e6 { voice-range: inherit; /* this could be omitted,
                            but we explicitly specify it for clarity purposes */
                            
-     voice-family: "yet-another-voice"; } /* the computed value is the same as
+     voice-family: "yet-another-voice"; } /* despite the voice change,
+                              the computed value is the same as
                               for "e5" (i.e. an absolute frequency value,
-                              independent from the active voice) */
+                              independent from the current voice) */
       </pre>
   </div>
 
@@ -2909,7 +2953,7 @@
 
    <tbody>
     <tr>
-     <td><a class=property href="#cue">cue</a>
+     <th><a class=property href="#cue">cue</a>
 
      <td>&lt;&lsquo;cue-before&rsquo;&gt; || &lt;&lsquo;cue-after&rsquo;&gt;
 
@@ -2924,7 +2968,7 @@
      <td>speech
 
     <tr>
-     <td><a class=property href="#cue-after">cue-after</a>
+     <th><a class=property href="#cue-after">cue-after</a>
 
      <td>&lt;uri&gt; &lt;decibel&gt;? | none
 
@@ -2939,7 +2983,7 @@
      <td>speech
 
     <tr>
-     <td><a class=property href="#cue-before">cue-before</a>
+     <th><a class=property href="#cue-before">cue-before</a>
 
      <td>&lt;uri&gt; &lt;decibel&gt;? | none
 
@@ -2954,7 +2998,7 @@
      <td>speech
 
     <tr>
-     <td><a class=property href="#pause">pause</a>
+     <th><a class=property href="#pause">pause</a>
 
      <td>&lt;&lsquo;pause-before&rsquo;&gt; ||
       &lt;&lsquo;pause-after&rsquo;&gt;
@@ -2970,7 +3014,7 @@
      <td>speech
 
     <tr>
-     <td><a class=property href="#pause-after">pause-after</a>
+     <th><a class=property href="#pause-after">pause-after</a>
 
      <td>&lt;time&gt; | none | x-weak | weak | medium | strong | x-strong
 
@@ -2985,7 +3029,7 @@
      <td>speech
 
     <tr>
-     <td><a class=property href="#pause-before">pause-before</a>
+     <th><a class=property href="#pause-before">pause-before</a>
 
      <td>&lt;time&gt; | none | x-weak | weak | medium | strong | x-strong
 
@@ -3000,7 +3044,7 @@
      <td>speech
 
     <tr>
-     <td><a class=property href="#rest">rest</a>
+     <th><a class=property href="#rest">rest</a>
 
      <td>&lt;&lsquo;rest-before&rsquo;&gt; ||
       &lt;&lsquo;rest-after&rsquo;&gt;
@@ -3016,7 +3060,7 @@
      <td>speech
 
     <tr>
-     <td><a class=property href="#rest-after">rest-after</a>
+     <th><a class=property href="#rest-after">rest-after</a>
 
      <td>&lt;time&gt; | none | x-weak | weak | medium | strong | x-strong
 
@@ -3031,7 +3075,7 @@
      <td>speech
 
     <tr>
-     <td><a class=property href="#rest-before">rest-before</a>
+     <th><a class=property href="#rest-before">rest-before</a>
 
      <td>&lt;time&gt; | none | x-weak | weak | medium | strong | x-strong
 
@@ -3046,7 +3090,7 @@
      <td>speech
 
     <tr>
-     <td><a class=property href="#speak">speak</a>
+     <th><a class=property href="#speak">speak</a>
 
      <td>auto | none | normal
 
@@ -3061,7 +3105,7 @@
      <td>speech
 
     <tr>
-     <td><a class=property href="#speak-as">speak-as</a>
+     <th><a class=property href="#speak-as">speak-as</a>
 
      <td>normal | spell-out || digits || [ literal-punctuation |
       no-punctuation ]
@@ -3077,7 +3121,7 @@
      <td>speech
 
     <tr>
-     <td><a class=property href="#voice-balance">voice-balance</a>
+     <th><a class=property href="#voice-balance">voice-balance</a>
 
      <td>&lt;number&gt; | left | center | right | leftwards | rightwards
 
@@ -3092,7 +3136,7 @@
      <td>speech
 
     <tr>
-     <td><a class=property href="#voice-duration">voice-duration</a>
+     <th><a class=property href="#voice-duration">voice-duration</a>
 
      <td>auto | &lt;time&gt;
 
@@ -3107,7 +3151,7 @@
      <td>speech
 
     <tr>
-     <td><a class=property href="#voice-family">voice-family</a>
+     <th><a class=property href="#voice-family">voice-family</a>
 
      <td>[[&lt;name&gt; | &lt;generic-voice&gt;],]* [&lt;name&gt; |
       &lt;generic-voice&gt;] | preserve
@@ -3123,7 +3167,7 @@
      <td>speech
 
     <tr>
-     <td><a class=property href="#voice-pitch">voice-pitch</a>
+     <th><a class=property href="#voice-pitch">voice-pitch</a>
 
      <td>&lt;frequency&gt; &amp;&amp; absolute | [[x-low | low | medium |
       high | x-high] || [&lt;frequency&gt; | &lt;semitones&gt; |
@@ -3140,7 +3184,7 @@
      <td>speech
 
     <tr>
-     <td><a class=property href="#voice-range">voice-range</a>
+     <th><a class=property href="#voice-range">voice-range</a>
 
      <td>&lt;frequency&gt; &amp;&amp; absolute | [[x-low | low | medium |
       high | x-high] || [&lt;frequency&gt; | &lt;semitones&gt; |
@@ -3157,7 +3201,7 @@
      <td>speech
 
     <tr>
-     <td><a class=property href="#voice-rate">voice-rate</a>
+     <th><a class=property href="#voice-rate">voice-rate</a>
 
      <td>[normal | x-slow | slow | medium | fast | x-fast] ||
       &lt;percentage&gt;
@@ -3173,7 +3217,7 @@
      <td>speech
 
     <tr>
-     <td><a class=property href="#voice-stress">voice-stress</a>
+     <th><a class=property href="#voice-stress">voice-stress</a>
 
      <td>normal | strong | moderate | none | reduced
 
@@ -3188,7 +3232,7 @@
      <td>speech
 
     <tr>
-     <td><a class=property href="#voice-volume">voice-volume</a>
+     <th><a class=property href="#voice-volume">voice-volume</a>
 
      <td>silent | [[x-soft | soft | medium | loud | x-loud] ||
       &lt;decibel&gt;]
@@ -3622,7 +3666,15 @@
    CSS WG) tests have not been produced for those feature(s) by the end of
    the CR period.
 
-  <h2 class=no-num id=changes>Appendix D &mdash; Changes from previous draft</h2>
+  <h2 class=no-num id=ack>Appendix D &mdash; Acknowledgements</h2>
+
+  <p> The editors would like to thank the members of the W3C Voice Browser
+   and Cascading Style Sheets working groups for their assistance in
+   preparing this specification. Special thanks to Ellen Eide (IBM) for her
+   detailed comments, and to Elika Etemad (Fantasai) for her thorough
+   reviews.
+
+  <h2 class=no-num id=changes>Appendix E &mdash; Changes from previous draft</h2>
 
   <p> Note that the <a
    href="http://www.w3.org/TR/2011/WD-css3-speech-20110419">previous Working
@@ -3729,10 +3781,7 @@
    <li>Cleaned-up the list of module dependencies, and removed redundant
     "module dependencies" section.
 
-   <li> Voice age now expressed using integers rather than a keyword
-    enumeration (&lsquo;<code class=property>child</code>&rsquo;,
-    &lsquo;<code class=property>young</code>&rsquo; and &lsquo;<code
-    class=property>old</code>&rsquo;). This aligns with SSML.
+   <li> Voice age keywords now mapped to SSML ages.
 
    <li>Improved the pause collapsing prose, removed redundant paragraphs.
 
@@ -3776,7 +3825,7 @@
             <li>Fixed minor typos</li>
             </ul -->
 
-  <h2 class=no-num id=references>Appendix E &mdash; References</h2>
+  <h2 class=no-num id=references>Appendix F &mdash; References</h2>
 
   <h3 class=no-num id=normative-references>Normative references</h3>
   <!--begin-normative-->

Index: Overview.src.html
===================================================================
RCS file: /sources/public/csswg/css3-speech/Overview.src.html,v
retrieving revision 1.86
retrieving revision 1.87
diff -u -d -r1.86 -r1.87
--- Overview.src.html	14 Jul 2011 18:54:37 -0000	1.86
+++ Overview.src.html	1 Aug 2011 17:45:43 -0000	1.87
@@ -420,7 +420,6 @@
       and right extremities, to represent the audio separation along the resulting left-right axis. </p>
     <p class="note"> Note that the functionality provided by this property has no match in the SSML
       markup language [[!SSML]]. </p>
-
     <dl>
       <dt>
         <strong>&lt;number&gt;</strong>
@@ -466,7 +465,6 @@
           clamping the resulting number to '100'.</p>
       </dd>
     </dl>
-
     <p> User agents may be connected to different kinds of sound systems, featuring varying audio
       mixing capabilities. The expected behavior for mono, stereo, and surround sound systems is
       defined as follows: </p>
@@ -483,7 +481,6 @@
         stereo layout. For example, the center channel as well as the left/right speakers may be
         used altogether in order to emulate the behavior of the 'center' value. </li>
     </ul>
-
     <p> Future revisions of the CSS Speech module may include support for three-dimensional audio,
       which would effectively enable authors to specify "azimuth" and "elevation" values. In the
       future, content authored using the current specification may therefore be consumed by
@@ -503,7 +500,6 @@
         degrees in a numerically linearly-proportional manner. For example, '-50' maps to -20
         degrees.</li>
     </ul>
-
     <p class="note"> Note that sound systems may be configured by users in such a way that it would
       interfere with the left-right audio distribution specified by document authors. Typically, the
       various "surround" modes available in modern sound systems (including systems based on basic
@@ -514,12 +510,10 @@
       which case the effect of the 'voice-balance' property would obviously not be perceivable at
       all. The rendering fidelity of authored content is therefore dependent on such user
       customizations, and the 'voice-balance' property merely specifies the desired end-result. </p>
-
     <p class="note"> Note that many speech synthesizers only generate mono sound, and therefore do
       not intrinsically support the 'voice-balance' property. The sound distribution along the
       left-right axis consequently occurs at post-synthesis stage (when the speech-enabled
       user-agent mixes the various audio sources authored within the document) </p>
-
     <h2 id="speaking-props">Speaking properties</h2>
     <h3 id="speaking-props-speak">The 'speak' property</h3>
     <table class="propdef" summary="name: syntax">
@@ -1475,8 +1469,18 @@
         <strong>&lt;age&gt;</strong>
       </dt>
       <dd>
-        <p> An <a href="#integer-def">integer</a> indicating the preferred age in years (since
-          birth) of the voice. Only positive integers (i.e. excluding zero) are allowed. </p>
+        <p> Possible values are 'child', 'young' and 'old', indicating the preferred age category to
+          match during voice selection. The mapping with [[!SSML]] ages is defined as follows:
+          'child' = 6 y/o, 'young' = 24 y/o, 'old' = 75 y/o (note that more flexible age ranges may
+          be used by the processor-dependent voice-matching algorithm). </p>
+        <p class="note"> The interpretation of the relationship between a person's age and a
+          recognizable type of voice cannot realistically be defined in a universal manner, as it
+          effectively depends on numerous cultural and linguistic variations. The values provided by
+          this specification therefore represent a simplified model that can be reasonably applied
+          to a great variety of speech locales, albeit at the cost of a certain degree of
+          approximation. Future versions of this specification may refine the level of precision of
+          the voice-matching algorithm, as speech processor implementations become more
+          standardized. </p>
       </dd>
       <dt>
         <strong>&lt;gender&gt;</strong>
@@ -1525,18 +1529,19 @@
       At any point within the content structure, the language takes precedence (i.e. has a higher
       priority) over the specified CSS voice characteristics. </p>
     <p> The following list outlines the voice selection algorithm (note that the definition of
-      "language" is loose here, in order to cater for dialectic variants):</p>
+      "language" is loose here, in order to cater for dialectic variations):</p>
     <ol>
       <li> If only a single voice instance is available for the language of the selected content,
         then this voice must be used, regardless of the specified CSS voice characteristics. </li>
       <li> If several voice instances are available for the language of the selected content, then
         the chosen voice is the one that most closely matches the specified name, or gender, age,
-        and preferred voice variant. The actual definition of "best match" is processor-dependent
-        (e.g. a reasonable match for "voice-family: 10 male;" may well be a higher-pitched female
-        voice, as this tone of voice may be close to that of a young boy). If no voice instance
-        matches the characteristics provided by any of the 'voice-family' component values, the
-        first available voice instance (amongst those suitable for the language of the selected
-        content) must be used. </li>
+        and preferred voice variant. The actual definition of "best match" is processor-dependent.
+        For example, in a system that only has male and female adult voices available, a reasonable
+        match for "voice-family: young male" may well be a higher-pitched female voice, as this tone
+        of voice would be close to that of a young boy. If no voice instance matches the
+        characteristics provided by any of the 'voice-family' component values, the first available
+        voice instance (amongst those suitable for the language of the selected content) must be
+        used. </li>
       <li> If no voice is available for the language of the selected content, it is recommended that
         user-agents let the user know about the lack of appropriate TTS voice. </li>
     </ol>
@@ -1552,11 +1557,11 @@
     <div class="example">
       <p>Examples of property values:</p>
       <pre>
-h1 { voice-family: announcer, 65 male; }
-p.romeo  { voice-family: romeo, 18 male; }
-p.juliet { voice-family: juliet, 19 female; }
-p.mercutio { voice-family: 26 male; }
-p.tybalt { voice-family: 30 male; }
+h1 { voice-family: announcer, old male; }
+p.romeo  { voice-family: romeo, young male; }
+p.juliet { voice-family: juliet, young female; }
+p.mercutio { voice-family: young male; }
+p.tybalt { voice-family: young male; }
 p.nurse { voice-family: amelie; }
 
 ...
@@ -1658,7 +1663,6 @@
           multiplied by 0.5 (half the value).</p>
       </dd>
     </dl>
-
     <div class="example">
       <p>Examples of inherited values:</p>
       <pre>
@@ -1698,7 +1702,6 @@
                                       'voice-rate' value is the same) */
       </pre>
     </div>
-
     <h3 id="voice-props-voice-pitch">The 'voice-pitch' property</h3>
     <table class="propdef" summary="name: syntax">
       <tbody>
@@ -1749,8 +1752,10 @@
           <td>
             <em>Computed value:</em>
           </td>
-          <td>an absolute frequency, or a keyword value and potentially also a frequency, semitone,
-            and/or percentage representing any non-zero offsets (relative to the keyword)</td>
+          <td> one of the predefined keywords if only the keyword is specified by itself, otherwise
+            a fixed frequency calculated by converting the keyword value (if any) to an absolute
+            value based on the current voice-family and by applying the specified relative offset
+            (if any)</td>
         </tr>
       </tbody>
     </table>
@@ -1759,7 +1764,6 @@
       processors (it approximately corresponds to the average pitch of the output). For example, the
       common pitch for a male voice is around 120Hz, whereas it is around 210Hz for a female
       voice.</p>
-
     <p class="note"> Note that the functionality provided by this property is related to the <a
         href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>pitch</code> attribute of
         the <code>prosody</code> element</a> from the SSML markup language [[!SSML]]. </p>
@@ -1793,9 +1797,10 @@
           equal temperament chromatic scale. A semitone can therefore be quantified as the
           difference between two consecutive pitch frequencies on such scale. The ratio between two
           consecutive frequencies separated by exactly one semitone is the twelfth root of two
-          (approximately 1.05946). As a result, the value in Hertz corresponding to a semitone
-          offset is relative to the initial frequency the offset is applied to (in other words, a
-          semitone doesn't correspond to a fixed numerical value in Hertz). </p>
+          (approximately 11011/10393, which equals exactly 1.0594631). As a result, the value in
+          Hertz corresponding to a semitone offset is relative to the initial frequency the offset
+          is applied to (in other words, a semitone doesn't correspond to a fixed numerical value in
+          Hertz). </p>
       </dd>
       <dt>
         <strong>&lt;percentage&gt;</strong>
@@ -1812,17 +1817,22 @@
           <strong>high</strong>, <strong>x-high</strong></dt>
       <dd>
         <p>A sequence of monotonically non-decreasing pitch levels that are implementation and voice
-          specific.</p>
+          specific. When the computed value for a given element is only a keyword (i.e. no relative
+          offset is specified), then the corresponding absolute frequency will be re-evaluated on a
+          voice change. Conversely, the application of a relative offset requires the calculation of
+          the resulting frequency based on the current voice at the point at which the relative
+          offset is specified, so the computed frequency will inherit absolutely regardless of any
+          voice change further down the style cascade. Authors should therefore only use keyword
+          values in cases where they wish that voice changes trigger the re-evaluation of the
+          conversion from a keyword to a concrete, voice-dependent frequency.</p>
       </dd>
     </dl>
-
     <p> Computed absolute frequency values that are negative are clamped to zero Hertz.
       Speech-capable user agents are likely to support a specific range of values rather than the
       full range of possible calculated numerical values for frequencies. The actual values in user
       agents may therefore be clamped to implementation-dependent minimum and maximum boundaries.
       For example: although the 0Hz frequency can be legitimately calculated, it may be clamped to a
       more meaningful value in the context of the speech synthesizer. </p>
-
     <div class="example">
       <p>Examples of property values:</p>
       <pre>
@@ -1887,8 +1897,10 @@
           <td>
             <em>Computed value:</em>
           </td>
-          <td>an absolute frequency, or a keyword value and potentially also a frequency, semitone,
-            and/or percentage representing any non-zero offsets (relative to the keyword) </td>
+          <td> one of the predefined keywords if only the keyword is specified by itself, otherwise
+            a fixed frequency calculated by converting the keyword value (if any) to an absolute
+            value based on the current voice-family and by applying the specified relative offset
+            (if any)</td>
         </tr>
       </tbody>
     </table>
@@ -1898,11 +1910,9 @@
       example when variations in inflection are used to convey meaning and emphasis in speech.
       Typically, a low range produces a flat, monotonic voice, whereas a high range produces an
       animated voice. </p>
-
     <p class="note"> Note that the functionality provided by this property is related to the <a
         href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>range</code> attribute of
         the <code>prosody</code> element</a> from the SSML markup language [[!SSML]]. </p>
-
     <dl>
       <dt>
         <strong>&lt;frequency&gt;</strong>
@@ -1933,9 +1943,10 @@
           equal temperament chromatic scale. A semitone can therefore be quantified as the
           difference between two consecutive pitch frequencies on such scale. The ratio between two
           consecutive frequencies separated by exactly one semitone is the twelfth root of two
-          (approximately 1.05946). As a result, the value in Hertz corresponding to a semitone
-          offset is relative to the initial frequency the offset is applied to (in other words, a
-          semitone doesn't correspond to a fixed numerical value in Hertz).</p>
+          (approximately 11011/10393, which equals exactly 1.0594631). As a result, the value in
+          Hertz corresponding to a semitone offset is relative to the initial frequency the offset
+          is applied to (in other words, a semitone doesn't correspond to a fixed numerical value in
+          Hertz).</p>
       </dd>
       <dt>
         <strong>&lt;percentage&gt;</strong>
@@ -1952,17 +1963,22 @@
           <strong>high</strong>, <strong>x-high</strong></dt>
       <dd>
         <p>A sequence of monotonically non-decreasing pitch levels that are implementation and voice
-          specific.</p>
+          specific. When the computed value for a given element is only a keyword (i.e. no relative
+          offset is specified), then the corresponding absolute frequency will be re-evaluated on a
+          voice change. Conversely, the application of a relative offset requires the calculation of
+          the resulting frequency based on the current voice at the point at which the relative
+          offset is specified, so the computed frequency will inherit absolutely regardless of any
+          voice change further down the style cascade. Authors should therefore only use keyword
+          values in cases where they wish that voice changes trigger the re-evaluation of the
+          conversion from a keyword to a concrete, voice-dependent frequency.</p>
       </dd>
     </dl>
-
     <p> Computed absolute frequency values that are negative are clamped to zero Hertz.
       Speech-capable user agents are likely to support a specific range of values rather than the
       full range of possible calculated numerical values for frequencies. The actual values in user
       agents may therefore be clamped to implementation-dependent minimum and maximum boundaries.
       For example: although the 0Hz frequency can be legitimately calculated, it may be clamped to a
       more meaningful value in the context of the speech synthesizer. </p>
-
     <div class="example">
       <p>Examples of inherited values:</p>
       <pre>
@@ -1987,32 +2003,33 @@
 
 body { voice-range: inherit; } /* the initial value is 'medium'
                                (the actual frequency value
-                               depends on the active voice) */
+                               depends on the current voice) */
 
 e1 { voice-range: +25%; } /* the computed value is
-                          ['medium' + 25%], which will resolve
+                          ['medium' + 25%] which resolves
                           to the frequency corresponding to 'medium'
                           plus 0.25 times the frequency
                           corresponding to 'medium' */
 
 e2 { voice-range: +10Hz; } /* the computed value is
-                          ['medium' + 25% + 10Hz], which will resolve
-                          to the frequency corresponding to 'medium'
-                          plus 0.25 times the frequency
-                          corresponding to 'medium',
-                          plus another 10 Hertz*/
+                          [FREQ + 10Hz] where "FREQ" is the absolute frequency
+                          calculated in the "e1" rule above.
+                          */
                           
 e3 { voice-range: inherit; /* this could be omitted,
                            but we explicitly specify it for clarity purposes */
                            
-     voice-family: "another-voice"; } /* the computed value is the same as
-                              for "e2", but here the voice is different,
-                              so once calculated, the used absolute frequency
-                              may be completely different
-                              due to voice-dependent discrepancies */
+     voice-family: "another-voice"; } /* this voice change would have resulted in
+                              the re-evaluation of the initial 'medium' keyword
+                              inherited by the "body" element
+                              (i.e. conversion from a voice-dependent keyword value
+                              to a concrete, absolute frequency),
+                              but because relative offsets were applied down the style
+                              cascade, the inherited value is actually the frequency
+                              calculated at the "e2" rule above. */
 
 e4 { voice-range: 200Hz absolute; } /* override with an absolute frequency
-                                    which doesn't depend on the active voice */
+                                    which doesn't depend on the current voice */
 
 e5 { voice-range: 2st; } /* the computed value is an absolute frequency,
                          which is the result of the
@@ -2023,9 +2040,10 @@
 e6 { voice-range: inherit; /* this could be omitted,
                            but we explicitly specify it for clarity purposes */
                            
-     voice-family: "yet-another-voice"; } /* the computed value is the same as
+     voice-family: "yet-another-voice"; } /* despite the voice change,
+                              the computed value is the same as
                               for "e5" (i.e. an absolute frequency value,
-                              independent from the active voice) */
+                              independent from the current voice) */
       </pre>
     </div>
     <table class="propdef" summary="name: syntax">
@@ -2590,7 +2608,11 @@
       end of the CR period. </p>
     <p>Features may/will also be dropped if adequate/sufficient (by judgment of CSS WG) tests have
       not been produced for those feature(s) by the end of the CR period. </p>
-    <h2 class="no-num" id="changes">Appendix D &mdash; Changes from previous draft</h2>
+    <h2 class="no-num" id="ack">Appendix D &mdash; Acknowledgements</h2> The editors would like to
+    thank the members of the W3C Voice Browser and Cascading Style Sheets working groups for their
+    assistance in preparing this specification. Special thanks to Ellen Eide (IBM) for her detailed
+    comments, and to Elika Etemad (Fantasai) for her thorough reviews. <h2 class="no-num"
+      id="changes">Appendix E &mdash; Changes from previous draft</h2>
     <p> Note that the <a href="http://www.w3.org/TR/2011/WD-css3-speech-20110419">previous Working
         Draft</a> includes <a href="http://www.w3.org/TR/2011/WD-css3-speech-20110419#changes">its
         own list of changes</a>, which - for succinctness - is not repeated here. </p>
@@ -2635,8 +2657,7 @@
       <li>Added the missing "Computed value" line to each property definition.</li>
       <li>Cleaned-up the list of module dependencies, and removed redundant "module dependencies"
         section.</li>
-      <li> Voice age now expressed using integers rather than a keyword enumeration ('child',
-        'young' and 'old'). This aligns with SSML. </li>
+      <li> Voice age keywords now mapped to SSML ages. </li>
       <li>Improved the pause collapsing prose, removed redundant paragraphs.</li>
       <li>Added the missing 'normal' value for 'voice-stress'.</li>
       <li>Separated the 'absolute' keyword for 'voice-pitch' and 'voice-range'.</li>
@@ -2667,7 +2688,7 @@
             <li>Reorganized appendixes</li>
             <li>Fixed minor typos</li>
             </ul -->
-    <h2 class="no-num" id="references">Appendix E &mdash; References</h2>
+    <h2 class="no-num" id="references">Appendix F &mdash; References</h2>
     <h3 class="no-num">Normative references</h3>
     <!--normative-->
     <h3 class="no-num">Other references</h3>

Received on Monday, 1 August 2011 17:45:49 UTC