Re: [css3-speech] voice-volume from Daniel Weck on 2011-05-11 (www-style@w3.org from May 2011)

From: Daniel Weck <daniel.weck@gmail.com>
Date: Wed, 11 May 2011 17:27:00 +0100
To: Mikko Rantalainen <mikko.rantalainen@peda.net>, "www-style@w3.org mailing list" <www-style@w3.org>
Message-Id: <A937A98C-821F-46FE-B66D-58BA2F46CB23@gmail.com>

Wait, my statement applies to SSML 1.0, not SSML 1.1 (where volume  
control was improved, with full support for dB values). The SSML v1.0  
reference is auto-generated in the CSS Speech editor's draft, I must  
figure-out how to update it :)

http://www.w3.org/TR/speech-synthesis11/#edef_prosody



On 11 May 2011, at 15:57, Daniel Weck wrote:

>
> On 11 May 2011, at 13:30, Mikko Rantalainen wrote:
>> Volume is usually referred by dB and the dB scale is not
>> linear but logarithmic. I'd expect "linear" to represent the power  
>> and
>> as such, I'd need to double the number to get a few dB increase in
>> volume level.
>
> Sure, wave amplitude is not linearly proportional to the perceived  
> loudness of a sound, but we're trying to maintain some compatibility  
> with SSML 1.0 where "The volume scale is linear amplitude".
>
> I agree that this is not ideal, because the low amplitude volume  
> levels are difficult to adjust based on a linear scale (sudden  
> "jump" in perceived loudness between 1 and 2, actually comparable  
> with the gap between 50 and 100 => low dynamic range).
>
> A logarithmic scale based on [0,100] would not make sense anyway, we  
> would need a new scale (e.g. [-90 +10], with audible 3db "steps").  
> Perhaps we could "fake" the logarithmic curve by describing how  
> [0,100] is mapped to a range of decibels values (i.e. 50 would  
> effectively mean 50% down the dB scale, half the perceived  
> loudness), but I am not sure this best serves the interest of  
> authors (it probably adds more confusion, actually). For the sake of  
> argument: in order to maintain compatibility with SSML, we would  
> also need to introduce yet another keyword in the CSS property  
> definition. So eventually we would have:
>
> - no keyword (discontinuous, monotonically non-decreasing mapping  
> with user-configured values <minimum audible>, <preferred>, <maximum  
> tolerable>, and 2 arbitrary values in between)
>
> - linear (raw wave amplitude, no mapping to perceivable sound =>  
> works fine, but not terribly useful in practice, and the accuracy of  
> low volume adjustments is compromised)
>
> - logarithmic (based on decibels => maps to perceived loudness,  
> "slider" control from <minimum> to <maximum> provides gradual and  
> accurate control)
>
> Knowing that simple arithmetics (e.g. dB-value = 20*log10(linear- 
> amplitude)) can be used to switch between the scales, I wonder if  
> all this is worth the hassle. Most authors won't know much about  
> numerical values anyway (let alone decibels), they are more likely  
> to use the user-configured levels (enumerated keywords from x-slow  
> to x-loud).
>
> Thoughts?
>
>> I'd prefer one of the following over "linear":
>>
>> - absolute
>> - direct
>> - override
>> - uncorrected
>> - raw
>> - accurate (?)
>> - through (?)
>> - force (?)
>> - manual (?)
>
>
> Thanks :)
> Dan

Daniel Weck
daniel.weck@gmail.com

Received on Wednesday, 11 May 2011 16:27:30 UTC