W3C home > Mailing lists > Public > www-style@w3.org > October 2011

[css3-speech] LC comment: please do NOT drop the "at-riske" voice-* properties

From: Gregory Rosmaita <gregory.rosmaita@gmail.com>
Date: Fri, 30 Sep 2011 23:11:57 -0400
Message-ID: <CAEPrdkDP=48g0=7sYQTaorojWNTKE3T7Na8MG791w41s=2f7YA@mail.gmail.com>
To: www-style@w3.org, wai-xtech <wai-xtech@w3.org>

as both a content consumer and creator, i STRONGLY urge the editors
of css3-speech retain the "at-risk" features which the "Status of
This Document" states:

QUOTE cite="http://www.w3.org/TR/css3-speech/#status"
may be dropped at the end of the CR period if there has not been
enough interest from implementers: 'voice-balance', 'voice-duration',
'voice-pitch', 'voice-range', and 'voice-stress'.

these "at-risk" features are part of the basic speech characteristics
toolbox with which almost ALL speech output users of all proficiency
levels are familiar.  changes in pitch, stress, range and/or duration
in response to specific types of markup and/or textual characteristics
are conventions which are already widely used.  moreover, control over
these voice characteristics is almost universally available to users of
dedicated speech output technology, and -- equally as important --
provide an instantly comprehensible means of customization of the aural
palette by the user.

css3-speech's primary beneficiaries are those who benefit from speech
modifications applied in accordance with a discrete set of rules, NOT
those whose tools currently limit the ability of a speech-output user
to tailor her experience to her preferences. implementers MUST NOT
be allowed to limit the aural palette available to the user and the
author.  there is a time for standards to lead implementers towards
practical solutions for actual users and user communities. and with the
voice- properties, the time to lead is now.  to do otherwise would be
to leave actual users and authors at the mercy of what implementers are
willing to implement in a limited time period (in this case, CR).  there
is absolutely no compelling reason why such properties should not AND
cannot be made available to the speech output user via css3-speech.

as for:

   * voice-balance:

its utility is predicated upon the assumption that more than one audio
channel will be available to the end user, but stereo perception of
the speech-output is not a universal nor a necessary condition for
successful use of speech synthesis, whereas control over pitch, stress,
range and duration are universally applicable to voice output/speech


1. voice-stress can be used to signify textual emphasis, such as
EM/I and STRONG/B -- changes in voice-pitch and/or voice-stress
are essential components of communicating and differentiating between
such markup to speech-output users;

   em { voice-stress: moderate; }
   strong { voice-stress: strong; voice-volume: loud; }
   blockquote { voice-stress: reduced; }

2. changes in voice stress, pitch and the like are familiar concepts
to speech-output users, and are -- by far -- the most used personalization
tools used by speech-output users to identify emphasized, bolded,
underlined, and other semantic markers in order to provide an equivalent
experience for the speech-output user that the sighted user gains by
discerning differences in font weight, marks of greater (STRONG/B) and
lesser (EM/I) stress, as well as providing vocal characteristic shifts
so that the speech-output user can be made aware that a string of text
with voice-characteristics set for it is actually a quote or a blockquote,
and there are those who wish to differentiate between vocal characteristic
changes for inline quotes and blockquotes; none of this would be possible
if the voice characteristics that are in danger of being dropped are

3. use of pitch, stress and richness changes is -- by far -- the least
intrusive method of communicating information about the formatting
and semantics of text when converting marked-up text to aural output;
moreover, these properties act upon the speech-engine's output itself,
and do not rely on the availability of, the loading of and the playing
of an audio file as an aural icon to indicate the beginning and the
end of the marked-up text; additionally, if the end user is using a
hardware speech synthesizer or the speech capacities of an auxiliary
device, as opposed to a software TTS engine, that speech-generating
device may not be capable of rendering audio files referred to by URL,
leaving that user solely dependent upon changes to the voice
characteristics of the speech engine to obtain aurally equivalent
information about the marked-up text;

4. the voice- properties which are in danger of being removed act on
the voice the user is currently using to convey information about
the document's markup and structure aurally -- such choices are often
limited by the capacities of the speech engine being used, and control
over such capacities may not be available to the user for a variety of
reasons in a variety of settings; therefore, it is wisest for a content
creator to primarily use uses pitch, stress and richness values to
communicate markup's meaning, rather than forcing a switch in
voice-family, which may not be available.

5. implementers MUST NOT be allowed to limit the aural palette
available to the user and the author. there is a time for standards
to lead implementers towards practical solutions for actual users and
user communities. and with the voice- properties, the time to lead is
now.  for example, assistive technology developers (in particular,
commercial AT developers) claimed when WCAG 1.0 was being drafted that
natural language switching when the @lang attribute is encountered was
not only not needed, but a practical impossibility, and, yet, such a
natural-language-switch on the fly has become a standard feature in
most screen readers, and is essential for a speech-output user to
operate in multi-lingual environments.

it is never safe to make assumptions on the part of the end user, for
while some users may desire an aurally rich environment which mixes
voice changes with aural indicators, many others will vastly prefer
to obtain such information "inline", as it were, through the modulation
and manipulation  of the voice characteristics of the voice being
used to read the content of a document.

thus, providing the content creator and end user with a variety of
means of communicating semantic indicators and textual characteristics
is essential to the success of CSS to tailor speech characteristics.

thank you VERY much again for moving forward this incredibly important
and long-overdue recommendation, gregory.

ACCOUNTABILITY, n. The mother of caution.
                 -- Ambrose Bierce, The Devil's Dictionary
     Gregory J. Rosmaita, gregory.rosmaita@gmail.com
      Camera Obscura: http://www.hicom.net/~oedipus/
  Oedipus' Online Complex: http://my.opera.com/oedipus/
Received on Saturday, 1 October 2011 03:12:26 UTC

This archive was generated by hypermail 2.4.0 : Friday, 25 March 2022 10:08:05 UTC