Re: [css3-speech] voice-pitch from Daniel Weck on 2011-07-07 (www-style@w3.org from July 2011)

From: Daniel Weck <daniel.weck@gmail.com>
Date: Thu, 7 Jul 2011 09:24:12 +0100
To: "www-style@w3.org style" <www-style@w3.org>, Alan Gresley <alan@css-class.com>
Cc: fantasai <fantasai.lists@inkedblade.net>
Message-Id: <58493B7E-28FB-485C-AD4A-314DCECB1201@gmail.com>

On 7 Jul 2011, at 05:01, Alan Gresley wrote:

> On 7/07/2011 11:26 AM, fantasai wrote:
>> On 07/06/2011 05:47 PM, Daniel Weck wrote:
>>>
>>> On 7 Jul 2011, at 01:37, fantasai wrote:
>>>> ... but when is multiplying the pitch itself by a
>>>> percentage useful?
>>>
>>> I want the speech output for a given element/text to sound
>>> "half as squeaky" as its siblings/text. :)
>>
>> Would that really be a percentage of the Hz, though?
>>
> No. One octave higher is a doubling of Hz. One octave lower is a  
> halving of Hz. It means that 50% is scaled closer to 100% than it is  
> to 0%

My "half as squeaky" remark was tongue-in-cheek just before going to  
bed (thus the smiley face). ;)

In SSML1.0, the volume control was flawed, so SSML1.1 fixed it by  
introducing decibels (the audio wave amplitude is not linearly  
proportional to the perceived loudness of sound, so a logarithmic  
scale is more useful as it reflects the "reality"). SSML1.1 also  
removed percentages for volume control, which makes sense  because  
changing the volume level on a linear scale is widely regarded as  
useless.

Now, semitones are significant on the diatonic scale, but why should  
we suppress the linear arithmetic control provided by percentage-based  
relative changes? Both SSML1.0 and SSML1.1 include this feature, and I  
can see how fine-grain control of the average voice pitch (e.g. using  
sliders) could be useful for persons who have hearing problems  
(hearing aid devices often amplify audio *and* shift frequencies to  
favor a particular area of the spectrum).

Now, I must admit that if we wanted to strictly conform to  
SSML1.0/1.1, the percentage should be an offset (signed relative  
increment/decrement), not a factor. This would avoid unnecessary  
numerical gymnastic when converting from one notation to the other.

Any thoughts?
Dan

http://www.w3.org/TR/speech-synthesis11/#edef_prosody

http://www.w3.org/TR/speech-synthesis/#edef_prosody

http://dev.w3.org/csswg/css3-speech/#voice-pitch

Received on Thursday, 7 July 2011 08:24:41 UTC