Re: [css3-speech] voice-volume from Daniel Weck on 2011-05-11 (www-style@w3.org from May 2011)

From: Daniel Weck <daniel.weck@gmail.com>
Date: Wed, 11 May 2011 15:57:49 +0100
To: Mikko Rantalainen <mikko.rantalainen@peda.net>
Cc: "www-style@w3.org" <www-style@w3.org>
Message-Id: <A695578A-0CA7-4ACF-9D94-859BD4FC7CF5@gmail.com>

On 11 May 2011, at 13:30, Mikko Rantalainen wrote:
> Volume is usually referred by dB and the dB scale is not
> linear but logarithmic. I'd expect "linear" to represent the power and
> as such, I'd need to double the number to get a few dB increase in
> volume level.

Sure, wave amplitude is not linearly proportional to the perceived  
loudness of a sound, but we're trying to maintain some compatibility  
with SSML 1.0 where "The volume scale is linear amplitude".

I agree that this is not ideal, because the low amplitude volume  
levels are difficult to adjust based on a linear scale (sudden "jump"  
in perceived loudness between 1 and 2, actually comparable with the  
gap between 50 and 100 => low dynamic range).

A logarithmic scale based on [0,100] would not make sense anyway, we  
would need a new scale (e.g. [-90 +10], with audible 3db "steps").  
Perhaps we could "fake" the logarithmic curve by describing how  
[0,100] is mapped to a range of decibels values (i.e. 50 would  
effectively mean 50% down the dB scale, half the perceived loudness),  
but I am not sure this best serves the interest of authors (it  
probably adds more confusion, actually). For the sake of argument: in  
order to maintain compatibility with SSML, we would also need to  
introduce yet another keyword in the CSS property definition. So  
eventually we would have:

- no keyword (discontinuous, monotonically non-decreasing mapping with  
user-configured values <minimum audible>, <preferred>, <maximum  
tolerable>, and 2 arbitrary values in between)

- linear (raw wave amplitude, no mapping to perceivable sound => works  
fine, but not terribly useful in practice, and the accuracy of low  
volume adjustments is compromised)

- logarithmic (based on decibels => maps to perceived loudness,  
"slider" control from <minimum> to <maximum> provides gradual and  
accurate control)

Knowing that simple arithmetics (e.g. dB-value = 20*log10(linear- 
amplitude)) can be used to switch between the scales, I wonder if all  
this is worth the hassle. Most authors won't know much about numerical  
values anyway (let alone decibels), they are more likely to use the  
user-configured levels (enumerated keywords from x-slow to x-loud).

Thoughts?

> I'd prefer one of the following over "linear":
>
> - absolute
> - direct
> - override
> - uncorrected
> - raw
> - accurate (?)
> - through (?)
> - force (?)
> - manual (?)

Thanks :)
Dan

Received on Wednesday, 11 May 2011 14:58:20 UTC