- From: Gregory Rosmaita <gregory.rosmaita@gmail.com>
- Date: Fri, 30 Sep 2011 23:11:57 -0400
- To: www-style@w3.org, wai-xtech <wai-xtech@w3.org>
aloha! as both a content consumer and creator, i STRONGLY urge the editors of css3-speech retain the "at-risk" features which the "Status of This Document" states: QUOTE cite="http://www.w3.org/TR/css3-speech/#status" may be dropped at the end of the CR period if there has not been enough interest from implementers: 'voice-balance', 'voice-duration', 'voice-pitch', 'voice-range', and 'voice-stress'. UNQUOTE these "at-risk" features are part of the basic speech characteristics toolbox with which almost ALL speech output users of all proficiency levels are familiar. changes in pitch, stress, range and/or duration in response to specific types of markup and/or textual characteristics are conventions which are already widely used. moreover, control over these voice characteristics is almost universally available to users of dedicated speech output technology, and -- equally as important -- provide an instantly comprehensible means of customization of the aural palette by the user. css3-speech's primary beneficiaries are those who benefit from speech modifications applied in accordance with a discrete set of rules, NOT those whose tools currently limit the ability of a speech-output user to tailor her experience to her preferences. implementers MUST NOT be allowed to limit the aural palette available to the user and the author. there is a time for standards to lead implementers towards practical solutions for actual users and user communities. and with the voice- properties, the time to lead is now. to do otherwise would be to leave actual users and authors at the mercy of what implementers are willing to implement in a limited time period (in this case, CR). there is absolutely no compelling reason why such properties should not AND cannot be made available to the speech output user via css3-speech. as for: * voice-balance: its utility is predicated upon the assumption that more than one audio channel will be available to the end user, but stereo perception of the speech-output is not a universal nor a necessary condition for successful use of speech synthesis, whereas control over pitch, stress, range and duration are universally applicable to voice output/speech synthesis. REASONS FOR RETAINING THE ENDANGERED VOICE- FEATURES 1. voice-stress can be used to signify textual emphasis, such as EM/I and STRONG/B -- changes in voice-pitch and/or voice-stress are essential components of communicating and differentiating between such markup to speech-output users; em { voice-stress: moderate; } strong { voice-stress: strong; voice-volume: loud; } blockquote { voice-stress: reduced; } 2. changes in voice stress, pitch and the like are familiar concepts to speech-output users, and are -- by far -- the most used personalization tools used by speech-output users to identify emphasized, bolded, underlined, and other semantic markers in order to provide an equivalent experience for the speech-output user that the sighted user gains by discerning differences in font weight, marks of greater (STRONG/B) and lesser (EM/I) stress, as well as providing vocal characteristic shifts so that the speech-output user can be made aware that a string of text with voice-characteristics set for it is actually a quote or a blockquote, and there are those who wish to differentiate between vocal characteristic changes for inline quotes and blockquotes; none of this would be possible if the voice characteristics that are in danger of being dropped are dropped. 3. use of pitch, stress and richness changes is -- by far -- the least intrusive method of communicating information about the formatting and semantics of text when converting marked-up text to aural output; moreover, these properties act upon the speech-engine's output itself, and do not rely on the availability of, the loading of and the playing of an audio file as an aural icon to indicate the beginning and the end of the marked-up text; additionally, if the end user is using a hardware speech synthesizer or the speech capacities of an auxiliary device, as opposed to a software TTS engine, that speech-generating device may not be capable of rendering audio files referred to by URL, leaving that user solely dependent upon changes to the voice characteristics of the speech engine to obtain aurally equivalent information about the marked-up text; 4. the voice- properties which are in danger of being removed act on the voice the user is currently using to convey information about the document's markup and structure aurally -- such choices are often limited by the capacities of the speech engine being used, and control over such capacities may not be available to the user for a variety of reasons in a variety of settings; therefore, it is wisest for a content creator to primarily use uses pitch, stress and richness values to communicate markup's meaning, rather than forcing a switch in voice-family, which may not be available. 5. implementers MUST NOT be allowed to limit the aural palette available to the user and the author. there is a time for standards to lead implementers towards practical solutions for actual users and user communities. and with the voice- properties, the time to lead is now. for example, assistive technology developers (in particular, commercial AT developers) claimed when WCAG 1.0 was being drafted that natural language switching when the @lang attribute is encountered was not only not needed, but a practical impossibility, and, yet, such a natural-language-switch on the fly has become a standard feature in most screen readers, and is essential for a speech-output user to operate in multi-lingual environments. it is never safe to make assumptions on the part of the end user, for while some users may desire an aurally rich environment which mixes voice changes with aural indicators, many others will vastly prefer to obtain such information "inline", as it were, through the modulation and manipulation of the voice characteristics of the voice being used to read the content of a document. thus, providing the content creator and end user with a variety of means of communicating semantic indicators and textual characteristics is essential to the success of CSS to tailor speech characteristics. thank you VERY much again for moving forward this incredibly important and long-overdue recommendation, gregory. ---------------------------------------------------------- ACCOUNTABILITY, n. The mother of caution. -- Ambrose Bierce, The Devil's Dictionary ---------------------------------------------------------- Gregory J. Rosmaita, gregory.rosmaita@gmail.com Camera Obscura: http://www.hicom.net/~oedipus/ Oedipus' Online Complex: http://my.opera.com/oedipus/ ----------------------------------------------------------
Received on Saturday, 1 October 2011 03:12:27 UTC