- From: Raman T. V. <raman@mv.us.adobe.com>
- Date: Mon, 12 Feb 1996 16:50:37 -0800
- To: www-style@w3.org
- Cc: raman@mv.us.adobe.com, szilles@mv.us.adobe.com, jking@mv.us.adobe.com, wmperry@spry.com
Here is a first-cut at a draft specification for speech stylesheets. --Raman <html> <head> <!--$Id: speech-spec.html,v 1.4 1996/02/13 00:30:19 raman Exp $ Description: Cascading Style Sheets For Aural Presentations Author: T. V. Raman <raman@adobe.com> Keywords: Speech, Audio, Rendering Styles --> <title>Style Sheets For Producing Aural Renderings</title> <Author> T. V. Raman <br> <A mailto="raman@adobe.com">Raman@adobe.com</a> </author> </head> <body> <h1>Style Sheets For Producing Spoken Renderings</h1> This document defined style-sheet extensions that add property-value definitions specific to aural renderings. This initial specification attempts to define properties that will be general while at the same time allowing browser implementors maximal flexibility in exploiting the features provided by different auditory displays. As the functionality provided by such displays becomes standardized this specification will evolve to encompass the features they provide. <P> Note that <em>speech</em> style-sheets play the dual role of specifying how a document should be rendered aurally to a user who is functionally blind, i.e. is not currently looking at a visual display, and may also specify how a visual rendering should be augmented with sound cues to provide a truly multimodal rendering. <H2>Design Philosophy</H2> A simple minded approach would dictate that an aural browser use the information present in the standard stylesheet to convey the same information aurally. This would only fit the scenario of producing a faithful aural presentation of a WWW document to someone who cannot see the visual display. <P> We adopt the more sophisticated solution of defining a separate (possibly cascaded) speech style-sheet so as to: <UL> <LI> Realize that the aural rendering is essentially independent of the visual rendering. <LI> Allow orthogonal aural and visual views. <LI> Allow future browsers to optionally implement both aural and visual views to produce truly <em>multimodal</em> documents. </UL> <P> This said, an auditory browser is free to use the information provided by the standard visual stylesheet to augment the aural rendering where necessary. Thus, when rendering a well-written document that uses the emphasis tag to mark emphasized phrases, such an aural browser would use the speech properties specified for emphasis in the speech stylesheet. However, if a document uses layout specific tags such as <IT> an aural browser can fall back on a default rendering that maps specific speech properties to the visual layout tags. In general, the speech stylesheet will not attempt to specify the mapping between visual layout tags and speech properties, instead leaving it to specific browser implementations to decide how such tags are rendered. <H2>Aural Properties</H2> In the following, we enumerate each property along with its possible values. Explanatory paragraphs describe how a browser might use such properties and their possible effect. The syntax used in the speech style-sheet will be the same as defined in CSS1 --hence, this document will not explicitly define the syntax. For all purposes, this document should be considered as an appendix to (or part of) the CSS1 specification. <H3>Speech Properties</H3> Speech properties specify the voice characterestic to be used when rednering specific document elements. <DL> <DT> :volume <DD> Number (decibels)<P> The volume of the speaker. Specified in decibels. <DT> :left-volume <DD> number 1--100 (percentage) <P> Specifies the speaker volume for the left-channel. Devices not supporting stereo output may ignore this setting. <DT> :right-volume <DD> number 1--100 (percent) <P> Specifies the speaker volume for the right-channel. Devices not supporting stereo output may ignore this setting. <DT> :voice-family <DD> string<P> Analogous to the :font-family property. This specifies the kind of voice to be used, and can be something generic such as <em>male</em> or something more specific such as <em>comedian</em> or something very specific such as <em>paul</em>. We recommend the same approach as used in the case of :font-family --the style sheet provide a list of possible values ranging from most to least specific and allow the browser to pick the most specific voice that it can find on the output device in use. <DT> :speech-rate <DD> Number (wordsper minute)<P> Specifies the speaking rate. <DT> :average-pitch <DD> number (hertz) <P> Specifies the average pitch of the speaking voice in hertz (hz). <DT> :pitch-range <DD> number (percentage variation 0--200) <P> Specifies variation in average pitch. A pitch range of 0 produces a flat, monotonic voice. A pitch range of 100 produces normal inflection. Pitch ranges greater than 100 produce animated voices. <DT> :stress <DD> number (0--100)<P> Specifies the level of stress (assertiveness or emphasis) of the speaking voice. English is a <strong>stressed</strong> language, and different parts of a sentence are assigned primary, secondary or tertiary stress. The value of property :stress controls amount of inflection that results from these stress markers. Different speech devices may require the setting of one or more device-specific parameters to achieve this effect. <P> Increasing the value of this property results in the speech being more strongly inflected. It is in a sense dual to property <em>:pitch-range</em> and is provided to allow developpers to exploit higher-end auditory displays. <DT> :richness <DD> number (0--100)<P> Specifies the richness (brightness) of the speaking voice. Different speech devices may require the setting of one or more device-specific parameters to achieve this effect. <P> The effect of increasing richness is to produce a voice that <em>carries</em> --reducing richness produces a soft, mellifluous voice. <DT> :speech-other <DD> List of name value pairs. <P> Allows implementors to experiment with features available on specific speech devices. The use of this property is device-specific, but is provided as an <em>escape mechanism</em>since auditory displays are not yet as standardized as their visual counterparts. Implementors are encouraged to use this property only where absolutely necessary. In many cases, the desired effect can be abstracted using the properties defined earlier and having the device-specific component of the browser map a single abstract property to a collection of device specific properties. </DL> <H3>Miscellaneous Speech Settings</H3> In addition to specifying voice properties, a speech style-sheet also specifies auxillary information such as the amount of pause to insert before or after rendering document elements. <DL> <DT> :pause-before-pause <DD> number (milliseconds) Amount of pause. (analogous to white space.)<P> Specifies the number of milliseconds of silence to insert <em>before</em> rendering a document element. In situations where the <em>:pause-before-pause</em> <strong>intersects</strong> the <em>:pause-after-pause</em> of the preceding document element, we compute the amount of pause to insert in a manner similar to that used to compute the amount of intervening whitespace in producing visual renderings. <DT> :pause-after-pause <DD> number (milliseconds) Amount of pause. (analogous to white space.)<P> Specifies the number of milliseconds of silence to insert <em>after</em> rendering a document element. <DT> :pause-around-pause <DD> number (milliseconds) Amount of pause. (analogous to white space.)<P> Specifies the number of milliseconds of silence to insert <em>before</em> and <em>after</em> rendering a document element. Though this effect can be achieved by using <em>:pause-before</em> and <em>:pause-after</em> in conjunction, style-sheet designers are encouraged to use <em>:pause-around</em> where appropriate since it makes the intent clearer. <strong> Perhaps :before :after and :around should be modifiers so they can be generally applied to other property settings?</strong> <DT> :pronunciation-mode <DD> string<P> Specify the pronunciation mode to be used when speaking a document element. Pronunciation modes can include <UL> <LI> Speak all punctuation marks <LI> Speak only some punctuations. In this case, the rule for handling punctuation marks is specified by providing a value for property :punctuation-marks-to-skip or :punctuation-marks-to-speak. <LI> Speak contents as a date. <LI> Speak contents as a time string </UL> The set of values for this property is left open so that designers can exploit all features available in a specific device. Style-sheet designers can specify a list of values for specifying a particular option in a amanner analogous to that described in specifying :voice-family. Browsers are expected to choose the most specific setting available on the current output device. Thus, for property :speak-time, a style sheet could specify <q>:speak-military-time</q> and <q>:speak-am-pm</q> etc. <P> The device-specific component of a browser is expected to map those values that it does not understand to a suitable default. Alternatively, the device-specific component of the browser may choose to transform the contents of the document element to a form that is suitable to be rendered by the specific device. To give an example: <P> Consider the value <em>date-string</em>. Given a content string of the form <em>Jan 1, 1996</em> an aural browser could: <UL> <LI> Ignore property :pronunciation-mode. <LI> Send the content string directly to a smart speech device capable of switching to a <q>speak date</q> mode. <LI> Apply an appropriate transform --in this example, change Jan to January-- when communicating with a less sophisticated output device. </UL> <DT> :language <DD> string<P> Language to use when rendering the contents of the document element. Specified by using the appropriate ISO encoding for international languages. <DT> :country <DD> string<P> Specified using ISO encoding for specifying country codes. Can be used in conjunction with :language to specify British or American English. (See property :dialect below for variations in speaking style within a country.) This property will be useful for multilingual speech devices capable of switching between languages. <DT> :dialect <DD> string<P> Specifies the dialect to be used, e.g.: american-mid-western-english. </DL> <H3>Non-Speech Auditory Cues</H3> Non-speech sounds can be used to produce <em>auditory icons</em>. Such auditory icons serve to augment the aural rendering and provide succinct cues. <P> <DL> <DT> :before-sound <DD> Filename or URL. <P> Specifies a file containing sound data. The sound is played <em>before</em> rendering the document element to produce an auditory icon. <DT> :after-sound <DD> Filename or URL. <P> Specifies a file containing sound data. The sound is played <em>after</em> rendering the document element to produce an auditory icon. <DT> :around-sound <DD> Filename or URL. <P> Specifies a file containing sound data. The sound is played <em>around</em> rendering the document element to produce an auditory icon. <DT> :during-sound <DD> Filename or URL. <P> Specifies a file containing sound data. The sound is played repeatedly <em>during</em> rendering the document element to produce an auditory icon that provides an aural backdrop. </DL> <H3>Advanced Settings </H3> In the future, auditory displays may want to exploit spatial audio for producing rich aural layout. Spatial audio --a digital signal processing technique that involves convolving sound data with appropriate filters to produce spatially located sounds-- can be used to make sounds <em>appear</em> to originate from different points in the listener's auditory space. <DL> <DT> :spatial-audio <DD> :azimuth number :elevation number Azimuth and elevation are specified in degrees and together specify the point in auditory space from which the sound appears to originate. <P> </DL> <H3>Open Questions</H3> It would be generally useful to allow style-sheets to specify relative changes i.e. increment or decrement the value of a particular property. How should the specification handle this? <hr> <address><A href="mailto:raman@adobe.com">Email: raman@adobe.com</a></address> <!-- hhmts start --> Last modified: Mon Feb 12 16:29:56 1996 <!-- hhmts end --> </body> </html> -- Best Regards, ____________________________________________________________________________ --raman Adobe Systems Tel: 1 (415) 962 3945 (B-1 115) Advanced Technology Group Fax: 1 (415) 962 6063 1585 Charleston Road Email: raman@adobe.com Mountain View, CA 94039 -7900 raman@cs.cornell.edu http://www-atg/People/Raman.html (Internal To Adobe) http://www.cs.cornell.edu/Info/People/raman/raman.html (Cornell) Disclaimer: The opinions expressed are my own and in no way should be taken as representative of my employer, Adobe Systems Inc. ____________________________________________________________________________
Received on Monday, 12 February 1996 19:51:31 UTC