csswg/css3-speech Overview.html,1.59,1.60 Overview.src.html,1.60,1.61 from Daniel Weck via cvs-syncmail on 2011-07-06 (public-css-commits@w3.org from July 2011)

From: Daniel Weck via cvs-syncmail <cvsmail@w3.org>
Date: Wed, 06 Jul 2011 16:12:51 +0000
To: public-css-commits@w3.org
Message-Id: <E1QeUia-0005Zp-0B@lionel-hutz.w3.org>
Update of /sources/public/csswg/css3-speech
In directory hutz:/tmp/cvs-serv21052

Modified Files:
	Overview.html Overview.src.html 
Log Message:
batch of changes (as per Fantasai thorough review).



Index: Overview.html
===================================================================
RCS file: /sources/public/csswg/css3-speech/Overview.html,v
retrieving revision 1.59
retrieving revision 1.60
diff -u -d -r1.59 -r1.60
--- Overview.html	14 Jun 2011 10:15:40 -0000	1.59
+++ Overview.html	6 Jul 2011 16:12:49 -0000	1.60
@@ -90,13 +90,13 @@
 
    <h1 id=top>CSS Speech Module</h1>
 
-   <h2 class="no-num no-toc" id=longstatus-date>Editor's Draft 14 June 2011</h2>
+   <h2 class="no-num no-toc" id=longstatus-date>Editor's Draft 06 July 2011</h2>
 
    <dl>
     <dt>This version:
 
     <dd>
-     <!--<a href="http://www.w3.org/TR/2011/WD-css3-speech-20110614">http://www.w3.org/TR/2011/ED-css3-speech-20110614/</a>-->
[...1761 lines suppressed...]
 
+   <li>Separated definition of semitones, as they are relative values already
+    (unlike Hz frequencies).
+
+   <li>More consistent behavior when audio cue URI fails (for whatever
+    reason).
+
    <li>Enabled voice-family names to contain spaces, matching &lsquo;<code
     class=property>font-family</code>&rsquo; syntax which is based on quoted
     strings and concatenated identifiers.
@@ -3439,6 +3465,9 @@
 
    <li>Improved document structure by adding sub-sections.
 
+   <li>Removed the implicit &lsquo;<code class=property>inherit</code>&rsquo;
+    value for all properties.
+
    <li>Fixed typos and made other minor edits.
   </ul>
   <!-- For reference only, changes in previous draft: -->

Index: Overview.src.html
===================================================================
RCS file: /sources/public/csswg/css3-speech/Overview.src.html,v
retrieving revision 1.60
retrieving revision 1.61
diff -u -d -r1.60 -r1.61
--- Overview.src.html	14 Jun 2011 10:15:40 -0000	1.60
+++ Overview.src.html	6 Jul 2011 16:12:49 -0000	1.61
@@ -158,8 +158,13 @@
       enable authors to declaratively control presentational aspects of the aural dimension (e.g.
       TTS voice, pitch, rate, and volume levels). These style sheet properties can be used together
       with visual properties (mixed media), or as a complete aural alternative to a visual
-      presentation. The aural "canvas" consists of a two-channel (stereo) space and of a temporal
-      dimension, within which synthetic speech and audio cues coexist.</p>
+      presentation.</p>
+    <p> Content creators can conditionally include CSS properties dedicated to user-agents with text
+      to speech synthesis capabilities, by specifying the "speech" media type via the
+        <code>media</code> attribute of the <code>link</code> element, or with the
+        <code>@media</code> at-rule, or within an <code>@import</code> statement. When doing so, the
+      styles authored within the scope of such conditional statements are ignored by user-agents
+      that do not support this module. </p>
     <h3 id="css21-rel">Relationship with CSS2.1</h3>
     <p> The CSS Speech module is a re-work of the informative CSS2.1 Aural appendix, within which
       the "aural" media type was described, but also deprecated (in favor of the "speech" media
@@ -167,23 +172,25 @@
       actually define the corresponding properties. This module describes the CSS properties that
       apply to the "speech" media type, and defines a new "box" model specifically for the aural
       dimension. </p>
-    <p> Content creators can conditionally include CSS properties dedicated to user-agents with text
-      to speech synthesis capabilities, by specifying the "speech" media type via the
-        <code>media</code> attribute of the <code>link</code> element, or with the
-        <code>@media</code> at-rule, or within an <code>@import</code> statement. When doing so, the
-      styles authored within the scope of such conditional statements are ignored by user-agents
-      that do not support this module. </p>
-    <h3 id="example">CSS Speech Example</h3>
-    <p>The following example shows how authors can tell the speech synthesizer to speak HTML
-      headings with a voice called "paul", using "moderate" emphasis (which is more than normal) and
-      how to insert an audio cue (prerecorded audio clip located at the given URL) before the start
-      of TTS rendering for each heading. In a stereo-capable sound system, paragraphs marked with
-      the CSS class "heidi" are rendered on the left audio channel (and with a female voice, etc.),
-      whilst the class "peter" corresponds to the right channel (and to a male voice, etc.). The
-      volume level of text spans marked with the class "special" is lower than normal, and a
-      prosodic boundary is created by introducing a strong pause after it is spoken (note how the
-        <code>span</code> inherits the voice-family from its parent paragraph).</p>
+    <h2 id="values">CSS values</h2>
+    <p>This specification follows the <a href="http://www.w3.org/TR/CSS21/about.html#property-defs"
+        >CSS property definition conventions</a> from [[!CSS21]]. Value types not defined in this
+      specification are defined in CSS Value and Units Level 3 [[!CSS3VAL]]. </p>
+    <p>In addition to the property-specific values listed in their definitions, all properties
+      defined in this specification also accept the <a
+        href="http://www.w3.org/TR/CSS21/cascade.html#value-def-inherit">inherit</a> keyword as
+      their property value. For readability it has not been repeated explicitly. </p>
+    <h2 id="example">Example</h2>
     <div class="example">
+      <p>This example shows how authors can tell the speech synthesizer to speak HTML headings with
+        a voice called "paul", using "moderate" emphasis (which is more than normal) and how to
+        insert an audio cue (prerecorded audio clip located at the given URL) before the start of
+        TTS rendering for each heading. In a stereo-capable sound system, paragraphs marked with the
+        CSS class "heidi" are rendered on the left audio channel (and with a female voice, etc.),
+        whilst the class "peter" corresponds to the right channel (and to a male voice, etc.). The
+        volume level of text spans marked with the class "special" is lower than normal, and a
+        prosodic boundary is created by introducing a strong pause after it is spoken (note how the
+          <code>span</code> inherits the voice-family from its parent paragraph).</p>
       <pre>
 h1, h2, h3, h4, h5, h6
 {
@@ -219,14 +226,18 @@
   I am Peter.
 &lt;/p&gt;</pre>
     </div>
-    <h2 id="aural-model">The <dfn id="aural-box-model">aural "box" model</dfn></h2>
+    <h2 id="aural-model">The aural formatting model</h2>
     <p>The CSS formatting model for aural media is based on a sequence of sounds and silences that
       occur within a nested context similar to the <a href="#box-model-def">visual box model</a>,
-      which we name the aural "box" model. The aural canvas is one-dimensional, or "monolinear". The
-      element is surrounded by 'rest', 'cue' and 'pause' properties (from the innermost to the
-      outermost position). These can be seen as aural equivalents to 'padding', 'border' and
-      'margin', respectively. The following diagram illustrates the equivalence between properties
-      of the visual and aural box models, applied to the selected &lt;element&gt;:</p>
+      which we name the <dfn id="aural-box-model">aural "box" model</dfn>. The aural "canvas"
+      consists of a two-channel (stereo) space and of a temporal dimension, within which synthetic
+      speech and audio cues coexist. The selected element is surrounded by 'rest', 'cue' and 'pause'
+      properties (from the innermost to the outermost position). These can be seen as aural
+      equivalents to 'padding', 'border' and 'margin', respectively. When used, the ':before' and
+      ':after' pseudo-elements [[!CSS21]] get inserted between the element's contents and the
+      'rest'. </p>
+    <p> The following diagram illustrates the equivalence between properties of the visual and aural
+      box models, applied to the selected &lt;element&gt;:</p>
     <p>
       <img alt="A graph depicting the aural 'box' model." id="aural-box" src="aural-box.png" />
     </p>
@@ -244,8 +255,7 @@
           <td>
             <em>Value:</em>
           </td>
-          <td>normal | silent | x-soft | soft | medium | loud | x-loud | &lt;decibel&gt; |
-            inherit</td>
+          <td>normal | silent | x-soft | soft | medium | loud | x-loud | &lt;decibel&gt; </td>
         </tr>
         <tr>
           <td>
@@ -269,7 +279,7 @@
           <td>
             <em>Percentages:</em>
           </td>
-          <td>refer to inherited value</td>
+          <td>N/A</td>
         </tr>
         <tr>
           <td>
@@ -328,11 +338,11 @@
         <strong>&lt;decibel&gt;</strong>
       </dt>
       <dd>
-        <p>An integer or floating point <a href="#number-def">number</a> immediately followed by
-          "dB" (decibel unit). This represents a change (positive or negative) relative to the
-          default or inherited volume level. This is expressed as the ratio of the squares of the
-          new signal amplitude (a1) and the current amplitude (a0), as per the following logarithmic
-          equation: volume(dB) = 20 log10 (a1 / a0) </p>
+        <p>A <a href="#number-def">number</a> immediately followed by "dB" (decibel unit). This
+          represents a change (positive or negative) relative to the default or inherited volume
+          level. This is expressed as the ratio of the squares of the new signal amplitude (a1) and
+          the current amplitude (a0), as per the following logarithmic equation: volume(dB) = 20
+          log10 (a1 / a0) </p>
         <p class="note"> Note that -6.0dB is approximately half the amplitude of the audio signal,
           and +6.0dB is approximately twice the amplitude.</p>
       </dd>
@@ -358,7 +368,7 @@
           <td>
             <em>Value:</em>
           </td>
-          <td>&lt;number&gt; | left | center | right | leftwards | rightwards | inherit</td>
+          <td>&lt;number&gt; | left | center | right | leftwards | rightwards </td>
         </tr>
         <tr>
           <td>
@@ -407,10 +417,11 @@
         <strong>&lt;number&gt;</strong>
       </dt>
       <dd>
-        <p>An integer or floating point <a href="#number-def">number</a> between '-100' and '100'.
-          For '-100' only the left channel is audible. Simarly for '100' only the right channel is
-          audible. For '0' both channels have the same level, so that the speech appears to be
-          coming from the center.</p>
+        <p>A <a href="#number-def">number</a> between '-100' and '100' (inclusive). Values smaller
+          than '-100' are clamped to '-100'. Values greater than '100' are clamped to '100'. When
+          the value is '-100', only the left channel is audible. Conversely, when the value is '100'
+          only the right channel is audible. When the value is '0', left and right channels both
+          have the same sound level, so that the speech appears to be coming from the center.</p>
       </dd>
       <dt>
         <strong>left</strong>
@@ -466,7 +477,7 @@
           <td>
             <em>Value:</em>
           </td>
-          <td>auto | none | normal | inherit</td>
+          <td>auto | none | normal </td>
         </tr>
         <tr>
           <td>
@@ -514,13 +525,12 @@
         <strong>auto</strong>
       </dt>
       <dd>
-        <p>Resolves to a computed value of 'none' when 'display' is 'none', otherwise resolves to a
-          computed value of 'auto' which yields a used value of 'normal'. </p>
-        <p class="note"> Note that 'display' is the only property defined externally to this CSS3
-          module that affects behavior within the aural "box" model. Also note that the 'none' value
-          of the 'display' property cannot be overridden by descendants of the selected element, but
-          the 'auto' value of 'speak' can however be overridden using either of 'none' or 'normal'.
-        </p>
+        <p>Resolves to a computed value of 'none' when <a href="#display-def">'display'</a> is
+          'none', otherwise resolves to a computed value of 'auto' which yields a used value of
+          'normal'. </p>
+        <p class="note"> Note that the 'none' value of the <a href="#display-def">'display'</a>
+          property cannot be overridden by descendants of the selected element, but the 'auto' value
+          of 'speak' can however be overridden using either of 'none' or 'normal'. </p>
       </dd>
       <dt>
         <strong>none</strong>
@@ -539,8 +549,9 @@
         <strong>normal</strong>
       </dt>
       <dd>
-        <p> The element is rendered aurally (regardless of its 'display' value and the 'display' and
-          'speak' values of its ancestors).</p>
+        <p> The element is rendered aurally (regardless of its <a href="#display-def">'display'</a>
+          value and the <a href="#display-def">'display'</a> and 'speak' values of its
+          ancestors).</p>
         <p class="note"> Note that using this value can result in the element being rendered in the
           aural dimension even though it would not be rendered on the visual canvas. </p>
       </dd>
@@ -558,8 +569,7 @@
           <td>
             <em>Value:</em>
           </td>
-          <td>normal | spell-out || digits || [ literal-punctuation | no-punctuation ] |
-            inherit</td>
+          <td>normal | spell-out || digits || [ literal-punctuation | no-punctuation ] </td>
         </tr>
         <tr>
           <td>
@@ -603,17 +613,16 @@
       basic predefined list of possible values.</p>
     <p class="note"> Note that the functionality provided by this property is related to the <a
         href="http://www.w3.org/TR/speech-synthesis11/#edef_say-as"><code>say-as</code> element</a>
-      from the SSML markup language [[!SSML]]. Also note that possible values are described in a W3C
-      Note ([[SSML-SAYAS]]) separate from the SSML specification, whereas the CSS Speech module
-      explicitly defines a list of possible values. </p>
+      from the SSML markup language [[!SSML]], whose values are described in the [[SSML-SAYAS]] W3C
+      Note. </p>
     <dl>
       <dt>
         <strong>normal</strong>
       </dt>
       <dd>
-        <p>Uses language-dependent pronunciation rules for rendering an element and its children.
-          For example, punctuation is not spoken as-is, but instead rendered naturally as
-          appropriate pauses.</p>
+        <p>Uses language-dependent pronunciation rules for rendering the element's content. For
+          example, punctuation is not spoken as-is, but instead rendered naturally as appropriate
+          pauses.</p>
       </dd>
       <dt>
         <strong>spell-out</strong>
@@ -640,15 +649,14 @@
         <strong>literal-punctuation</strong>
       </dt>
       <dd>
-        <p>Similar to 'normal' value, but punctuation such as semicolons, braces, and so on are to
-          be spoken literally.</p>
+        <p> Punctuation such as semicolons, braces, and so on is named aloud (i.e. spoken literally)
+          rather than rendered naturally as appropriate pauses.</p>
       </dd>
       <dt>
         <strong>no-punctuation</strong>
       </dt>
       <dd>
-        <p>Similar to 'normal' value but punctuation is not to be spoken nor rendered as various
-          pauses.</p>
+        <p>Punctuation is not rendered: neither spoken nor rendered as pauses.</p>
       </dd>
     </dl>
     <h2 id="pause-props">Pause properties </h2>
@@ -665,7 +673,7 @@
           <td>
             <em>Value:</em>
           </td>
-          <td>&lt;time&gt; | none | x-weak | weak | medium | strong | x-strong | inherit</td>
+          <td>&lt;time&gt; | none | x-weak | weak | medium | strong | x-strong </td>
         </tr>
         <tr>
           <td>
@@ -718,7 +726,7 @@
           <td>
             <em>Value:</em>
           </td>
-          <td>&lt;time&gt; | none | x-weak | weak | medium | strong | x-strong | inherit</td>
+          <td>&lt;time&gt; | none | x-weak | weak | medium | strong | x-strong </td>
         </tr>
         <tr>
           <td>
@@ -770,16 +778,14 @@
         <strong>&lt;time&gt;</strong>
       </dt>
       <dd>
-        <p>Expresses the pause in absolute time units (seconds and milliseconds, e.g. "+3s",
-          "250ms") as per the syntax of <a href="#time-def">time</a> values defined in [[!CSS3VAL]].
-          Only non-negative values are allowed.</p>
+        <p>Expresses the pause in absolute <a href="#time-def">time</a> units (seconds and
+          milliseconds, e.g. "+3s", "250ms"). Only non-negative values are allowed.</p>
       </dd>
       <dt>
         <strong>none</strong>
       </dt>
       <dd>
-        <p> Equivalent to 0ms (no prosodic break in the speech output). This value can be used to
-          inhibit a prosodic break which the processor would otherwise produce. </p>
+        <p> Equivalent to 0ms (no prosodic break is produced by the speech processor). </p>
       </dd>
       <dt>
         <strong>x-weak</strong>, <strong>weak</strong>, <strong>medium</strong>,
@@ -794,10 +800,10 @@
     <p class="note"> Note that stronger content boundaries are typically accompanied by pauses. For
       example, the breaks between paragraphs are typically much more substantial than the breaks
       between words within a sentence. </p>
-    <p> The following example illustrates how the default strengths of prosodic breaks for specific
-      elements (which are defined by the user-agent stylesheet) can be overridden by authored
-      styles: </p>
     <div class="example">
+      <p> This example illustrates how the default strengths of prosodic breaks for specific
+        elements (which are defined by the user-agent stylesheet) can be overridden by authored
+        styles. </p>
       <pre>
 p { pause: none } /* pause-before: none; pause-after: none */</pre>
     </div>
@@ -858,6 +864,7 @@
       values are given, the first value is 'pause-before' and the second is 'pause-after'. If only
       one value is given, it applies to both properties.</p>
     <div class="example">
+      <p> Examples of property values:</p>
       <pre>
 h1 { pause: 20ms; } /* pause-before: 20ms; pause-after: 20ms */
 h2 { pause: 30ms 40ms; } /* pause-before: 30ms; pause-after: 40ms */
@@ -866,10 +873,9 @@
     <h3 id="collapsing">Collapsing pauses</h3>
     <p> The pause defines the minimum distance of the aural "box" to the aural "boxes" before and
       after it. Adjoining pauses are merged by selecting the strongest named break and the longest
-      absolute time interval. </p>
-    <p class="note"> For example, "strong" is selected when comparing "strong" and "weak", "1s" is
-      selected when comparing "1s" and "250ms", and "strong" and "250ms" take effect additively when
-      comparing "strong" and "250ms". </p>
+      absolute time interval. For example, "strong" is selected when comparing "strong" and "weak",
+      "1s" is selected when comparing "1s" and "250ms", and "strong" and "250ms" take effect
+      additively when comparing "strong" and "250ms". </p>
     <p>The following pauses are adjoining:</p>
     <ol>
       <li>The 'pause-after' of an aural "box" and the 'pause-after' of its last child, provided the
@@ -900,7 +906,7 @@
           <td>
             <em>Value:</em>
           </td>
-          <td>&lt;time&gt; | none | x-weak | weak | medium | strong | x-strong | inherit</td>
+          <td>&lt;time&gt; | none | x-weak | weak | medium | strong | x-strong </td>
         </tr>
         <tr>
           <td>
@@ -953,7 +959,7 @@
           <td>
             <em>Value:</em>
           </td>
-          <td>&lt;time&gt; | none | x-weak | weak | medium | strong | x-strong | inherit</td>
+          <td>&lt;time&gt; | none | x-weak | weak | medium | strong | x-strong </td>
         </tr>
         <tr>
           <td>
@@ -1004,9 +1010,8 @@
         <strong>&lt;time&gt;</strong>
       </dt>
       <dd>
-        <p>Expresses the rest in absolute time units (seconds and milliseconds, e.g. "+3s", "250ms")
-          as per the syntax of <a href="#time-def">time</a> values defined in [[!CSS3VAL]]. Only
-          non-negative values are allowed.</p>
+        <p>Expresses the rest in absolute <a href="#time-def">time</a> units (seconds and
+          milliseconds, e.g. "+3s", "250ms"). Only non-negative values are allowed.</p>
       </dd>
       <dt>
         <strong>none</strong>
@@ -1098,7 +1103,7 @@
           <td>
             <em>Value:</em>
           </td>
-          <td>&lt;uri&gt; &lt;decibel&gt;? | none | inherit</td>
+          <td>&lt;uri&gt; &lt;decibel&gt;? | none </td>
         </tr>
         <tr>
           <td>
@@ -1122,7 +1127,7 @@
           <td>
             <em>Percentages:</em>
           </td>
-          <td>apply to inherited value for 'voice-volume'</td>
+          <td>N/A</td>
         </tr>
         <tr>
           <td>
@@ -1151,7 +1156,7 @@
           <td>
             <em>Value:</em>
           </td>
-          <td>&lt;uri&gt; &lt;decibel&gt;? | none | inherit</td>
+          <td>&lt;uri&gt; &lt;decibel&gt;? | none </td>
         </tr>
         <tr>
           <td>
@@ -1175,7 +1180,7 @@
           <td>
             <em>Percentages:</em>
           </td>
-          <td>apply to inherited value for 'voice-volume'</td>
+          <td>N/A</td>
         </tr>
         <tr>
           <td>
@@ -1193,9 +1198,7 @@
     </table>
     <p>The 'cue-before' and 'cue-after' properties specify auditory icons (i.e. prerecorded audio
       clips) to be played before (or after) the selected element within the <a href="#aural-model"
-        >audio "box" model</a>. When a user agent is not able to render the specified auditory icon,
-      it is recommended to produce an alternative cue. (e.g. popping up a warning, emitting a
-      warning sound)</p>
+        >audio "box" model</a>.</p>
     <p class="note"> Note that the functionality provided by this property is related to the <a
         href="http://www.w3.org/TR/speech-synthesis11/#edef_audio"><code>audio</code> element</a>
       from the SSML markup language [[!SSML]]. </p>
@@ -1204,9 +1207,9 @@
         <strong>&lt;uri&gt;</strong>
       </dt>
       <dd>
-        <p>The URI designates an auditory icon resource. If the URI resolves to something other than
-          an audio file, such as an image, the resource is ignored and the property treated as if it
-          had the value 'none'.</p>
+        <p>The URI designates an auditory icon resource. When a user agent is not able to render the
+          specified auditory icon (e.g. missing file resource, or unsupported audio codec), it is
+          recommended to produce an alternative cue, such as a bell sound.</p>
       </dd>
       <dt>
         <strong>none</strong>
@@ -1218,14 +1221,17 @@
         <strong>&lt;decibel&gt;</strong>
       </dt>
       <dd>
-        <p> The loudness of prerecorded audio cues can be adjusted relative to the volume level of
-          synthetic speech, inherited value from the 'voice-volume' property. This value is an
-          integer or floating point <a href="#number-def">number</a> immediately followed by "dB"
-          (decibel unit). The default value is '+0.0dB' (no change). If the inherited value of the
-          'voice-volume' property is 'silent', the provided value has no effect and the volume level
-          for the audio cue is resolved to 'silent'. Decibels are an expression of the ratio of the
-          squares of the new signal amplitude (a1) and the current amplitude (a0), as per the
-          following logarithmic equation: volume(dB) = 20 log10 (a1 / a0) </p>
+        <p>A <a href="#number-def">number</a> immediately followed by "dB" (decibel unit). This
+          represents a change (positive or negative) relative to the default sound level of audio
+          clip. This is expressed as the ratio of the squares of the new signal amplitude (a1) and
+          the current amplitude (a0), as per the following logarithmic equation: volume(dB) = 20
+          log10 (a1 / a0)</p>
+        <p>Audio cues apply to the selected element within the <a href="#aural-model">audio "box"
+            model</a>, so when the inherited value from the 'voice-volume' property is 'silent', the
+          volume level for the audio cue is resolved to -infinity decibels (which effectively
+          silences the audio cue), regardless of the value provided for this &lt;decibel&gt;. In
+          other words, a selected element can be entirely silenced (i.e. including its associated
+          audio cues) by setting the 'voice-volume' property to 'silent'. </p>
         <p class="note"> Note that -6.0dB is approximately half the amplitude of the audio signal,
           and +6.0dB is approximately twice the amplitude.</p>
         <p class="note"> Note that there is a difference between an audio cue whose volume is set to
@@ -1236,6 +1242,7 @@
       </dd>
     </dl>
     <div class="example">
+      <p> Examples of property values:</p>
       <pre>
 a
 {
@@ -1288,7 +1295,7 @@
           <td>
             <em>Percentages:</em>
           </td>
-          <td>apply to inherited value for 'voice-volume'</td>
+          <td>N/A</td>
         </tr>
         <tr>
           <td>
@@ -1308,6 +1315,7 @@
       the first value is 'cue-before' and the second is 'cue-after'. If only one value is given, it
       applies to both properties.</p>
     <div class="example">
+      <p> Example of shorthand notation:</p>
       <pre>
 h1
 {
@@ -1335,8 +1343,8 @@
             <em>Value:</em>
           </td>
           <td> [[&lt;name&gt; | [&lt;age&gt;? &lt;gender&gt; &lt;non-negative number&gt;?]],]*
-            [&lt;name&gt; | [&lt;age&gt;? &lt;gender&gt; &lt;non-negative number&gt;?]] | preserve |
-            inherit </td>
+            [&lt;name&gt; | [&lt;age&gt;? &lt;gender&gt; &lt;non-negative number&gt;?]] | preserve
+          </td>
         </tr>
         <tr>
           <td>
@@ -1391,30 +1399,13 @@
           Voice names must either be given quoted as <a href="#strings-def">strings</a>, or unquoted
           as a sequence of one or more <a href="#identifier-def">identifiers</a>. </p>
         <p class="note">Note that as a result, most punctuation characters, or digits at the start
-          of each token, must be escaped in unquoted voice names. For example, the following
-          declarations are invalid: </p>
-        <div class="example">
-          <pre>
-voice-family: john/doe; /* forward slash character should be escaped */
-voice-family: john "doe"; /* identifier sequence cannot contain strings */
-voice-family: john!; /* exclamation mark should be escaped */
-voice-family: john@doe; /* "at" character should be escaped */
-voice-family: #john; /* identifier cannot start with hash character */
-voice-family: john 1st; /* identifier cannot start with digit */</pre>
-        </div>
+          of each token, must be escaped in unquoted voice names. </p>
         <p> If a sequence of identifiers is given as a voice name, the computed value is the name
           converted to a string by joining all the identifiers in the sequence by single spaces. </p>
         <p> Voice names that happen to be the same as the gender keywords ('male', 'female' and
           'neutral') or that happen to match the keywords 'inherit' or 'preserve' must be quoted to
           disambiguate with these keywords. The keywords 'initial' and 'default' are reserved for
           future use and must also be quoted when used as voice names. </p>
-        <p class="note"> Note that to avoid mistakes in escaping, it is recommended to quote voice
-          names that contain white space, digits, or punctuation characters other than hyphens. For
-          example: </p>
-        <div class="example">
-          <pre>
-voice-family: "john doe", "Henry the-8th";</pre>
-        </div>
         <p class="note"> Note that in [[!SSML]], voice names are space-separated and cannot contain
           whitespace characters.</p>
       </dd>
@@ -1454,7 +1445,23 @@
           name, gender, age). </p>
       </dd>
     </dl>
-    <h4 class="no-toc" id="voice-props-lang-handling">Voice selection, content language</h4>
+    <div class="example">
+      <p> Examples of invalid declarations: </p>
+      <pre>
+voice-family: john/doe; /* forward slash character should be escaped */
+voice-family: john "doe"; /* identifier sequence cannot contain strings */
+voice-family: john!; /* exclamation mark should be escaped */
+voice-family: john@doe; /* "at" character should be escaped */
+voice-family: #john; /* identifier cannot start with hash character */
+voice-family: john 1st; /* identifier cannot start with digit */</pre>
+    </div>
+    <div class="example">
+      <p> This is an example of valid voice names that contain white space, digits, or punctuation
+        characters other than hyphens, but which are quoted nonetheless, for reading clarity. </p>
+      <pre>
+voice-family: "john doe", "Henry the-8th";</pre>
+    </div>
+    <h4 class="no-toc" id="voice-selection">Voice selection, content language</h4>
     <p>The 'voice-family' property is used to guide the selection of the speech synthesis voice. As
       part of this selection process, speech-capable user agents must also take into account the
       language of the selected element within the markup content. The "name", "gender", "age", and
@@ -1472,18 +1479,18 @@
         variant. The actual definition of "best match" is processor-dependent.</li>
       <li> If no voice is available for the language of the selected content, it is recommended that
         user-agents let the user know about the lack of appropriate TTS voice. </li>
-      <li>The speech synthesizer voice must be re-evaluated (i.e. the selection process must take
-        place once again) whenever any of the CSS voice characteristics change within the content
-        flow. The voice must also be re-calculated whenever the content language changes, unless the
-        'preserve' keyword is used (this may be useful in cases where embedded foreign language text
-        can be spoken using a voice not designed for this language, as demonstrated by the example
-        below). <p class="note">Note that dynamically computing a voice may lead to unexpected lag,
-          so user-agents should try to resolve concrete voice instances in the document tree before
-          the playback starts. </p>
-      </li>
     </ol>
-    <p>Here are a few examples:</p>
+    <p>The speech synthesizer voice must be re-evaluated (i.e. the selection process must take place
+      once again) whenever any of the CSS voice characteristics change within the content flow. The
+      voice must also be re-calculated whenever the content language changes, unless the 'preserve'
+      keyword is used (this may be useful in cases where embedded foreign language text can be
+      spoken using a voice not designed for this language, as demonstrated by the example
+      below).</p>
+    <p class="note">Note that dynamically computing a voice may lead to unexpected lag, so
+      user-agents should try to resolve concrete voice instances in the document tree before the
+      playback starts. </p>
     <div class="example">
+      <p>Examples of property values:</p>
       <pre>
 h1 { voice-family: announcer, 65 male; }
 p.romeo  { voice-family: romeo, 18 male; }
@@ -1517,7 +1524,7 @@
           <td>
             <em>Value:</em>
           </td>
-          <td>normal | &lt;percentage&gt; | x-slow | slow | medium | fast | x-fast | inherit</td>
+          <td>normal | &lt;percentage&gt; | x-slow | slow | medium | fast | x-fast </td>
         </tr>
         <tr>
           <td>
@@ -1577,18 +1584,15 @@
       <dd>
         <p>Only non-negative <a href="#percentage-def">percentage</a> values are allowed. Computed
           values are calculated relative to the default speaking rate for the voice (the "normal"
-          computed value).</p>
-        <p class="note"> Note that a leading "+" sign does not denote an increment, for example +50%
-          is equivalent to 50% (i.e. the computed value equals the inherited value times 0.5, which
-          is half the normal rate of the voice). </p>
+          computed value). For example, 50% means that the default value gets multiplied by 0.5,
+          which results in half the default rate of the voice.</p>
       </dd>
       <dt><strong>x-slow</strong>, <strong>slow</strong>, <strong>medium</strong>,
           <strong>fast</strong> and <strong>x-fast</strong></dt>
       <dd>
         <p>A sequence of monotonically non-decreasing speaking rates that are implementation and
-          voice specific. </p>
-        <p class="note">Note that typical values are (in words per minute) x-slow = 80, slow = 120,
-          medium = between 180 and 200, fast = 500. </p>
+          voice specific. For example, typical values for the English language are (in words per
+          minute) x-slow = 80, slow = 120, medium = between 180 and 200, fast = 500. </p>
       </dd>
     </dl>
     <h3 id="voice-props-voice-pitch">The 'voice-pitch' property</h3>
@@ -1604,8 +1608,8 @@
           <td>
             <em>Value:</em>
           </td>
-          <td>&lt;frequency&gt; | &lt;percentage&gt; | &lt;relative-value&gt; &amp;&amp; relative |
-            x-low | low | medium | high | x-high | inherit</td>
+          <td>&lt;frequency&gt; &amp;&amp; relative? | &lt;semitones&gt; | &lt;percentage&gt; |
+            x-low | low | medium | high | x-high </td>
         </tr>
         <tr>
           <td>
@@ -1656,39 +1660,41 @@
         <strong>&lt;frequency&gt;</strong>
       </dt>
       <dd>
-        <p> Specifies the average pitch of the speaking voice using an absolute value in frequency
-          units (Hertz and kiloHertz, e.g. "100Hz", "+2kHz") as per the syntax of <a
-            href="#frequency-def">frequency</a> values defined in [[!CSS3VAL]]. Only positive values
-          are allowed. </p>
+        <p> A value in <a href="#frequency-def">frequency</a> units (Hertz or kiloHertz, e.g.
+          "100Hz", "+2kHz"). Unless the 'relative' keyword is used, values are restricted to
+          positive numbers (using negative numbers results in the property value being ignored).
+          When the 'relative' keyword is used, the provided value specifies a relative change
+          (decrement or increment) to the inherited value. When the 'relative' keyword is not used,
+          the provided value specifies the average pitch of the speaking voice, expressed as an
+          absolute frequency. </p>
       </dd>
       <dt>
-        <strong>&lt;percentage&gt;</strong>
+        <strong>relative</strong>
       </dt>
       <dd>
-        <p> Only non-negative <a href="#percentage-def">percentage</a> values are allowed. Computed
-          values are calculated relative to the inherited value. </p>
-        <p class="note"> Note that a leading "+" sign does not denote an increment. For example,
-          +50% is equivalent to 50%, so the computed value equals the inherited value times 0.5
-          (i.e. divided by 2), which is half the inherited average pitch of the voice. </p>
+        <p> This keyword specifies that the provided frequency value is expressed relatively to
+          another base value. This disambiguates absolute positive &lt;frequency&gt; values from
+          increments (e.g. "+2kHz" can either be an increment or an absolute value). </p>
       </dd>
       <dt>
-        <strong>&lt;relative-value&gt;</strong>
+        <strong>&lt;semitones&gt;</strong>
       </dt>
       <dd>
         <p> Specifies a relative change (decrement or increment) to the inherited value. The syntax
-          of allowed values is a &lt;<a href="#number-def">number</a>&gt;, followed immediately by
-          either of "Hz" (for Hertz) or "kHz" (for kiloHertz) or "st" (for semitones).</p>
-        <p class="note"> Note that unlike with the syntax of <a href="#frequency-def">frequency</a>
-          values defined in [[!CSS3VAL]], here the provided number can be positive or negative. The
-          'relative' keyword must be used to disambiguate absolute frequency values (e.g. "+10Hz"
-          versus "+10Hz relative") </p>
+          of allowed values is a &lt;<a href="#number-def">number</a>&gt; followed immediately by
+          "st" (semitones). A semitone is half of a tone (a half step) on the standard diatonic
+          scale. As such, a semitone doesn't correspond to a fixed frequency: the ratio between two
+          consecutive frequencies separated by exactly one semitone is the twelfth root of two
+          (approximately 1.05946). </p>
       </dd>
       <dt>
-        <strong>relative</strong>
+        <strong>&lt;percentage&gt;</strong>
       </dt>
       <dd>
-        <p> This keyword specifies that the provided value is expressed relatively to another base
-          value. This is in order to disambiguate from absolute &lt;frequency&gt; values. </p>
+        <p> Only non-negative <a href="#percentage-def">percentage</a> values are allowed. Computed
+          values are calculated relative to the inherited value. For example, 50% means that the
+          inherited value gets multiplied by 0.5, which results in half the inherited average pitch
+          of the voice. </p>
       </dd>
       <dt><strong>x-low</strong>, <strong>low</strong>, <strong>medium</strong>,
           <strong>high</strong>, <strong>x-high</strong></dt>
@@ -1698,6 +1704,7 @@
       </dd>
     </dl>
     <div class="example">
+      <p>Examples of property values:</p>
       <pre>
 h1 { voice-pitch: 250Hz; }
 h1 { voice-pitch: +250Hz; } /* identical to the line above */
@@ -1719,8 +1726,8 @@
           <td>
             <em>Value:</em>
           </td>
-          <td>&lt;frequency&gt; | &lt;percentage&gt; | &lt;relative-value&gt; &amp;&amp; relative |
-            x-low | low | medium | high | x-high | inherit</td>
+          <td>&lt;frequency&gt; &amp;&amp; relative? | &lt;semitones&gt; | &lt;percentage&gt; |
+            x-low | low | medium | high | x-high </td>
         </tr>
         <tr>
           <td>
@@ -1772,41 +1779,43 @@
         <strong>&lt;frequency&gt;</strong>
       </dt>
       <dd>
-        <p> Specifies the average pitch range of the speaking voice using an absolute value in
-          frequency units (Hertz and kiloHertz, e.g. "100Hz", "+2kHz") as per the syntax of <a
-            href="#frequency-def">frequency</a> values defined in [[!CSS3VAL]]. Only positive values
-          are allowed. </p>
+        <p> A value in <a href="#frequency-def">frequency</a> units (Hertz or kiloHertz, e.g.
+          "100Hz", "+2kHz"). Unless the 'relative' keyword is used, values are restricted to
+          positive numbers (using negative numbers results in the property value being ignored).
+          When the 'relative' keyword is used, the provided value specifies a relative change
+          (decrement or increment) to the inherited value. When the 'relative' keyword is not used,
+          the provided value specifies the average pitch of the speaking voice, expressed as an
+          absolute frequency. </p>
         <p class="note"> Low ranges produce a flat, monotonic voice. A high range produces animated
           voices. </p>
       </dd>
       <dt>
-        <strong>&lt;percentage&gt;</strong>
+        <strong>relative</strong>
       </dt>
       <dd>
-        <p> Only non-negative <a href="#percentage-def">percentage</a> values are allowed. Computed
-          values are calculated relative to the inherited value.</p>
-        <p class="note"> Note that a leading "+" sign does not denote an increment. For example,
-          +50% is equivalent to 50%, so the computed value equals the inherited value times 0.5
-          (i.e. divided by 2), which is half the inherited average pitch range of the voice. </p>
+        <p> This keyword specifies that the provided frequency value is expressed relatively to
+          another base value. This disambiguates absolute positive &lt;frequency&gt; values from
+          increments (e.g. "+2kHz" can either be an increment or an absolute value). </p>
       </dd>
       <dt>
-        <strong>&lt;relative-value&gt;</strong>
+        <strong>&lt;semitones&gt;</strong>
       </dt>
       <dd>
-        <p> Specifies a change (decrement or increment) relative to the inherited value. The syntax
-          of allowed values is a &lt;<a href="#number-def">number</a>&gt;, immediately followed by
-          either of "Hz" (for Hertz) or "st" (for semitones).</p>
-        <p class="note"> Note that unlike with the syntax of <a href="#frequency-def">frequency</a>
-          values defined in [[!CSS3VAL]], here the provided number can be positive or negative. The
-          'relative' keyword must be used to disambiguate absolute frequency values (e.g. "+10Hz"
-          versus "+10Hz relative") </p>
+        <p> Specifies a relative change (decrement or increment) to the inherited value. The syntax
+          of allowed values is a &lt;<a href="#number-def">number</a>&gt; followed immediately by
+          "st" (semitones). A semitone is half of a tone (a half step) on the standard diatonic
+          scale. As such, a semitone doesn't correspond to a fixed frequency: the ratio between two
+          consecutive frequencies separated by exactly one semitone is the twelfth root of two
+          (approximately 1.05946).</p>
       </dd>
       <dt>
-        <strong>relative</strong>
+        <strong>&lt;percentage&gt;</strong>
       </dt>
       <dd>
-        <p> This keyword specifies that the provided value is expressed relatively to another base
-          value. This is in order to disambiguate from absolute &lt;frequency&gt; values. </p>
+        <p> Only non-negative <a href="#percentage-def">percentage</a> values are allowed. Computed
+          values are calculated relative to the inherited value. For example, 50% means that the
+          inherited value gets multiplied by 0.5, which results in half the inherited average pitch
+          range of the voice. </p>
       </dd>
       <dt><strong>x-low</strong>, <strong>low</strong>, <strong>medium</strong>,
           <strong>high</strong> and <strong>x-high</strong></dt>
@@ -1815,10 +1824,6 @@
           language-dependent.</p>
       </dd>
     </dl>
-    <p class="note"> Note that a semitone is half of a tone (a half step) on the standard diatonic
-      scale. A semitone doesn't correspond to a fixed value in Hertz: instead, the ratio between two
-      consecutive frequencies separated by exactly one semitone is approximately 1.05946 (the
-      twelfth root of two). </p>
     <table class="propdef" summary="name: syntax">
       <tbody>
         <tr>
@@ -1831,7 +1836,7 @@
           <td>
             <em>Value:</em>
           </td>
-          <td>normal | strong | moderate | none | reduced | inherit</td>
+          <td>normal | strong | moderate | none | reduced </td>
         </tr>
         <tr>
           <td>
@@ -1904,6 +1909,7 @@
       </dd>
     </dl>
     <div class="example">
+      <p>Examples of property values, with HTML sample:</p>
       <pre>
 span.default-emphasis { voice-stress: normal; }
 span.lowered-emphasis { voice-stress: reduced; }
@@ -1982,9 +1988,14 @@
       </tbody>
     </table>
     <p> The 'voice-duration' property specifies how long it should take to render the selected
-      element's content (excluding <a href="#cue-props">audio cues</a> ). Unless the value 'auto' is
-      specified, this property takes precedence over the 'voice-rate' property and should be used to
-      determine the speaking rate of the voice. </p>
+      element's content (not including <a href="#cue-props">audio cues</a>, <a href="#pause-props">
+        pauses</a> and <a href="#rest-props">rests</a> ). Unless the value 'auto' is specified, this
+      property takes precedence over the 'voice-rate' property, and should be used to determine a
+      suitable speaking rate for the voice. An element for which the 'voice-duration' property value
+      is not 'auto' may have descendants for which the 'voice-duration' and 'voice-rate' properties
+      are specified, but these must be ignored. In other words, when a 'time' is specified for the
+      'voice-duration' of a selected element, it applies to the entire element subtree (children
+      cannot override the property). </p>
     <p class="note"> Note that the functionality provided by this property is related to the <a
         href="http://www.w3.org/TR/speech-synthesis11/#edef_prosody"><code>duration</code> attribute
         of the <code>prosody</code> element</a> from the SSML markup language [[!SSML]]. </p>
@@ -1993,16 +2004,15 @@
         <strong>auto</strong>
       </dt>
       <dd>
-        <p>Resolves to a computed value corresponding to the duration of the speech synthesis when
-          using the inherited 'voice-rate'. </p>
+        <p>Resolves to a used value corresponding to the duration of the speech synthesis when using
+          the inherited 'voice-rate'. </p>
       </dd>
       <dt>
         <strong>&lt;time&gt;</strong>
       </dt>
       <dd>
-        <p> Specifies a value in absolute time units (seconds and milliseconds, e.g. "+3s", "250ms")
-          as per the syntax of <a href="#time-def">time</a> values defined in [[!CSS3VAL]]. Only
-          non-negative values are allowed. </p>
+        <p> Specifies a value in absolute <a href="#time-def">time</a> units (seconds and
+          milliseconds, e.g. "+3s", "250ms"). Only non-negative values are allowed. </p>
       </dd>
     </dl>
     <h2 id="lists">List items and counters styles</h2>
@@ -2013,11 +2023,10 @@
       The CSS Speech module defines how to render these styles in the aural dimension, using speech
       synthesis. The '<a href="#list-style-image-def"><code class="property"
         >list-style-image</code></a>' property of [[!CSS21]] is ignored, and instead the '<a
-        href="#list-style-type-def"><code class="property">list-style-type</code></a>' is used (if
-      present). </p>
-    <p class="note"> Note that the working draft of the CSS Lists module [[CSS3LIST]] contains new
-      features which are not yet supported in this version of the CSS Speech module. Support for
-      these features will be added later, when the CSS Lists draft stabilizes.</p>
+        href="#list-style-type-def"><code class="property">list-style-type</code></a>' is used. </p>
+    <p class="note"> Note that the speech rendering of new features from the CSS Lists and Counters
+      Module Level 3 [[CSS3LIST]] is not covered in this level of CSS Speech, but may be defined in
+      a future specification.</p>
     <dl>
       <dt>
         <strong>disc, circle, square</strong>
@@ -2057,54 +2066,18 @@
       user-agents supporting the CSS Speech module ensure that these additional audio cues and
       speech output don't generate redundancies or create inconsistencies (for example: duplicated
       or different list item numbering scheme). </p>
-    <h2 id="pronunciation"> Pronunciation, phonemes </h2>
-    <p class="note">Note that this entire section is non-normative.</p>
-    <p> CSS does not specify how to define the pronunciation (expressed using a well-defined
-      phonetic alphabet) of a particular piece of text within the markup document. A "phonemes"
-      property was described in earlier drafts of this specification, but objections were raised due
-      to breaking the principle of separation between content and presentation (the "phonemes"
-      authored within aural CSS stylesheets would have needed to be updated each time text changed
-      within the markup document). The "phonemes" functionality is therefore considered out-of-scope
-      in CSS (the presentation layer) and should be addressed in the markup / content layer.</p>
-    <p> The W3C PLS (Pronunciation Lexicon Specification) recommendation ([[PRONUNCIATION-LEXICON]])
-      is one potential format to use with the <a
-        href="http://microformats.org/wiki/rel-pronunciation"></a>"pronunciation" <code>rel</code>
-      value, which allows importing pronunciation lexicons in HTML documents using the
-        <code>link</code> element (similarly to how CSS stylesheets can be included). </p>
-    <p> Additionally, an attribute-based mechanism can be used within the markup to author
-      text-pronunciation associations. At the time of writing, such mechanism isn't formally defined
-      in the W3C HTML standard(s). However, the <a href="http://idpf.org/epub/30">EPUB 3.0 draft
-        specification</a> allows (x)HTML5 documents to contain attributes derived from the [[!SSML]]
-      specification, that describe how to pronounce text based on a particular phonetic
-      alphabet.</p>
-    <!-- p> 
-    One avenue to explore is the use CSS to "bind" HTML text with a   
-    phoneme (also declared in the HTML document). This would maintain a   
-    clear separation between content and presentation, and it would allow   
-    authors to define different pronunciations for one given text token   
-    (Media Queries could drive the switch of stylesheet to import). This   
-    possibility has been mentioned several times by Working Group members   
-    as well as people from the public mailing-list, so it cannot be   
-    ignored. However, there are architectural considerations (e.g.   
-    collision between CSS versus HTML -defined phonemes) which make this a   
-    lot trickier to standardize than it sounds. The   
-    whole "speech synthesis" issue should be tackled globally at the level   
-    of the W3C ecosystem. For example, there are many cross-cutting   
-    concerns with the work done by the HTML-Audio and HTML-Speech   
-    Incubator Groups.
-      </p -->
     <h2 id="content">Inserted and replaced content</h2>
-    <!-- p class="note">Note that this entire section is non-normative.</p -->
+    <p class="note">Note that this entire section is non-normative.</p>
     <p>Sometimes, authors will want to specify a mapping from the source text into another string
       prior to the application of the regular pronunciation rules. This may be used for uncommon
       abbreviations or acronyms which are unlikely to be recognized by the synthesizer. The
-      'content' property can be used to replace one string by another. </p>
-    <p class="note"> Note that the functionality provided by this property is related to the <a
-        href="http://www.w3.org/TR/speech-synthesis11/#edef_sub"><code>alias</code> attribute of the
-          <code>sub</code> element</a> from the SSML markup language [[!SSML]]. </p>
-    <p> In the following example, the abbreviation is rendered using the content of the title
-      attribute instead of the element's content:</p>
+      'content' property can be used to replace one string by another. The functionality provided by
+      this property is related to the <a href="http://www.w3.org/TR/speech-synthesis11/#edef_sub"
+          ><code>alias</code> attribute of the <code>sub</code> element</a> from the SSML markup
+      language [[!SSML]]. </p>
     <div class="example">
+      <p> In this example, the abbreviation is rendered using the content of the title attribute
+        instead of the element's content.</p>
       <pre>
 /* This replaces the content of the selected element
 by the string "World Wide Web Consortium". */
@@ -2114,11 +2087,11 @@
 &lt;abbr title="World Wide Web Consortium"&gt;W3C&lt;/abbr&gt;</pre>
     </div>
     <p>In a similar way, text strings in a document can be replaced by a previously recorded
-      version. In the following example - assuming the format is supported, the file is available
-      and the UA is configured to do so - a recording of Sir John Gielgud's declamation of the
-      famous monologue is played. Otherwise the UA falls back to render the text using synthesized
-      speech: </p>
+      version.</p>
     <div class="example">
+      <p>In this example - assuming the format is supported, the file is available and the UA is
+        configured to do so - a recording of Sir John Gielgud's declamation of the famous monologue
+        is played. Otherwise the UA falls back to render the text using synthesized speech. </p>
       <pre>
 .hamlet { content: url(./audio/gielgud.wav); }
 ...
@@ -2129,13 +2102,12 @@
     </div>
     <p>Furthermore, authors (or users via a user stylesheet) may add some information to ease the
       understanding of structures during non-visual interaction with the document. They can do so by
-      using the '::before' and '::after' pseudo-elements that will be inserted between the element's
-      contents and the 'rest'. Note that different stylesheets can be used to define the level of
-      verbosity for additional information spoken by screen readers. .</p>
-    <p>The following example inserts the string "Start list: " before a list and the string "List
-      item: " before the content of each list item. Likewise, the string "List end: " gets inserted
-      after the list to inform the user that the list speech output is over.</p>
+      using the '::before' and '::after' pseudo-elements. Note that different stylesheets can be
+      used to define the level of verbosity for additional information spoken by screen readers.</p>
     <div class="example">
+      <p>This example inserts the string "Start list: " before a list and the string "List item: "
+        before the content of each list item. Likewise, the string "List end: " gets inserted after
+        the list to inform the user that the list speech output is over.</p>
       <pre>
 ul::before { content: "Start list: "; }
 ul::after  { content: "List end. "; }
@@ -2143,6 +2115,42 @@
     </div>
     <p>Detailed information can be found in the CSS3 Generated and Replaced Content module
       [[CSS3GENCON]].</p>
+    <h2 id="pronunciation"> Pronunciation, phonemes </h2>
+    <p class="note">Note that this entire section is non-normative.</p>
+    <p> CSS does not specify how to define the pronunciation (expressed using a well-defined
+      phonetic alphabet) of a particular piece of text within the markup document. A "phonemes"
+      property was described in earlier drafts of this specification, but objections were raised due
+      to breaking the principle of separation between content and presentation (the "phonemes"
+      authored within aural CSS stylesheets would have needed to be updated each time text changed
+      within the markup document). The "phonemes" functionality is therefore considered out-of-scope
+      in CSS (the presentation layer) and should be addressed in the markup / content layer.</p>
+    <p> The <a href="http://microformats.org/wiki/rel-pronunciation">"pronunciation"</a>
+      <code>rel</code> value allows importing pronunciation lexicons in HTML documents using the
+        <code>link</code> element (similar to how CSS stylesheets can be included). The W3C PLS
+      (Pronunciation Lexicon Specification) [[PRONUNCIATION-LEXICON]] is one format that can be used
+      to describe such a lexicon.</p>
+    <p> Additionally, an attribute-based mechanism can be used within the markup to author
+      text-pronunciation associations. At the time of writing, such mechanism isn't formally defined
+      in the W3C HTML standard(s). However, the <a href="http://idpf.org/epub/30">EPUB 3.0 draft
+        specification</a> allows (x)HTML5 documents to contain attributes derived from the [[!SSML]]
+      specification, that describe how to pronounce text based on a particular phonetic
+      alphabet.</p>
+    <!-- p> 
+      One avenue to explore is the use CSS to "bind" HTML text with a   
+      phoneme (also declared in the HTML document). This would maintain a   
+      clear separation between content and presentation, and it would allow   
+      authors to define different pronunciations for one given text token   
+      (Media Queries could drive the switch of stylesheet to import). This   
+      possibility has been mentioned several times by Working Group members   
+      as well as people from the public mailing-list, so it cannot be   
+      ignored. However, there are architectural considerations (e.g.   
+      collision between CSS versus HTML -defined phonemes) which make this a   
+      lot trickier to standardize than it sounds. The   
+      whole "speech synthesis" issue should be tackled globally at the level   
+      of the W3C ecosystem. For example, there are many cross-cutting   
+      concerns with the work done by the HTML-Audio and HTML-Speech   
+      Incubator Groups.
+      </p -->
     <hr title="Separator from footer" />
     <h2 class="no-num" id="property-index">Appendix A &mdash; Property index</h2>
     <!-- properties -->
@@ -2392,9 +2400,13 @@
       <li>Corrected the [initial] values for 'voice-pitch-range' and 'voice-pitch' to "medium".</li>
       <li>Added an "auto" value to 'voice-duration', which is the [initial] property value as
         well.</li>
+      <li>Handling of 'voice-balance' values outside of the allowed range (clamping).</li>
       <li>Added the 'normal' value for voice-rate ("default" in SSML 1.1).</li>
       <li>Renamed voice-family fields to be consistent with SSML.</li>
       <li>Improved the 'voice-family' selection algorithm to cater for language changes.</li>
+      <li>Separated definition of semitones, as they are relative values already (unlike Hz
+        frequencies).</li>
+      <li>More consistent behavior when audio cue URI fails (for whatever reason).</li>
       <li>Enabled voice-family names to contain spaces, matching 'font-family' syntax which is based
         on quoted strings and concatenated identifiers.</li>
       <li>Added a new section to define the relationship of this specification with CSS2.1.</li>
@@ -2407,6 +2419,7 @@
       <li>Added the missing 'normal' value for 'voice-stress'.</li>
       <li>Separated the 'relative' keyword for 'voice-pitch' and 'voice-range'.</li>
       <li>Improved document structure by adding sub-sections.</li>
+      <li>Removed the implicit 'inherit' value for all properties.</li>
       <li>Fixed typos and made other minor edits.</li>
     </ul>
     <!-- For reference only, changes in previous draft: -->
Received on Wednesday, 6 July 2011 16:12:55 UTC