Re: [css3-speech] cue volume from Daniel Weck on 2011-07-07 (www-style@w3.org from July 2011)

From: Daniel Weck <daniel.weck@gmail.com>
Date: Thu, 7 Jul 2011 10:31:01 +0100
To: "www-style@w3.org style" <www-style@w3.org>, fantasai <fantasai.lists@inkedblade.net>
Message-Id: <AD0CF7E9-9EA7-42E7-9573-CF513526A442@gmail.com>

Well, just to put things into perspective, let's say you have 2 pre- 
recorded audio clips, one for cue-before, one for cue-after. The first  
one was recorded "normally" (whatever the convention is), whereas the  
second one is really loud on average (for example, compressed  
waveform, narrow dynamic range). Unless the audio implementation is  
"clever" (e.g. automatic normalization/equalization/filtering ... note  
that I am not an audio engineer), the user can't reduce the large  
variations of perceived volume level. So authors obviously have a  
responsibility to prevent ear drum damage and to limit listening  
inconvenience. User agents implementations are likely to "naively"  
expose controls that apply a relative volume offset to all audio  
resources, indiscriminately. In the same way that there is a  
"normality" for recorded audio clips, there is a standard volume level  
in TTS engines, regardless of the selected voice instance. This means  
that the cornerstone of CSS3 Speech volume control is the 'medium'  
keyword (well, and its neighbors on the audible scale too). The user- 
agent is able to map user preferences with concrete volume levels in  
the underlying speech synthesis processor, and I expect the same to be  
possible at the level of the audio engine.

The current draft allows authors to define audio cues that are, for  
example, softer than the speech synthesis (i.e. secondary content,  
more "discrete"). In this case, the author assumes that the user-agent  
is able to render *perceived* sound levels correctly for audio cues,  
just like for synthesized speech.

The flip side of the coin is that authors cannot directly control a  
volume offset relative to the "intrinsic" loudness of a given audio  
clip. The two methods appear to be mutually-exclusive.

Also note that as an implementor, I have a preference for the latter  
approach (audio cues volume disconnected from speech synthesis  
levels), but as an author I'd rather align audio cues loudness  
relatively to TTS rendering.

Any thoughts?
Dan

On 7 Jul 2011, at 02:34, fantasai wrote:

> On 07/06/2011 05:50 PM, Daniel Weck wrote:
>> Please verify that the updated prose makes sense, and looks  
>> implementable:
>>
>> http://dev.w3.org/csswg/css3-speech/#mixing-props-voice-volume
>>
>> http://dev.w3.org/csswg/css3-speech/#cue-props
>
> Um, I can't make sense of it.
>
> I have a cue sound clip prerecorded by the author.
> I have a computed voice-volume relative to medium.
> And (as the UA), I have the actual value of medium, which cannot be  
> calculated
> from the document but is set by a user preference.
>
> 1. How do I, as the UA, set the volume of the cue?
>
> 2. How does the author, who doesn't know what 'medium' computes to,  
> figure
>   out how loud the cue will be relative to the voice volume?

Received on Thursday, 7 July 2011 09:31:31 UTC