W3C home > Mailing lists > Public > www-style@w3.org > July 2011

Re: [css3-speech] voice-pitch

From: Alan Gresley <alan@css-class.com>
Date: Fri, 08 Jul 2011 00:55:32 +1000
Message-ID: <4E15C8E4.7000204@css-class.com>
To: Daniel Weck <daniel.weck@gmail.com>
CC: "www-style@w3.org style" <www-style@w3.org>
On 7/07/2011 6:24 PM, Daniel Weck wrote:
> On 7 Jul 2011, at 05:01, Alan Gresley wrote:
>> On 7/07/2011 11:26 AM, fantasai wrote:
>>> On 07/06/2011 05:47 PM, Daniel Weck wrote:
>>>> On 7 Jul 2011, at 01:37, fantasai wrote:
>>>>> ... but when is multiplying the pitch itself by a
>>>>> percentage useful?
>>>> I want the speech output for a given element/text to sound
>>>> "half as squeaky" as its siblings/text. :)
>>> Would that really be a percentage of the Hz, though?
>> No. One octave higher is a doubling of Hz. One octave lower is a
>> halving of Hz. It means that 50% is scaled closer to 100% than it is
>> to 0%
> My "half as squeaky" remark was tongue-in-cheek just before going to bed
> (thus the smiley face). ;)
> In SSML1.0, the volume control was flawed, so SSML1.1 fixed it by
> introducing decibels (the audio wave amplitude is not linearly
> proportional to the perceived loudness of sound, so a logarithmic scale
> is more useful as it reflects the "reality"). SSML1.1 also removed
> percentages for volume control, which makes sense because changing the
> volume level on a linear scale is widely regarded as useless.
> Now, semitones are significant on the diatonic scale, but why should we
> suppress the linear arithmetic control provided by percentage-based
> relative changes?

Not quite sure what is being suppressed. A diatonic scale increases by 
1.0594635 approximately to arrived at double the hertz in 12 steps. It 
would be good if we could use a scaled based 12 notation since 01~12 
tones along with 1~2 for octaves would make it simpler than mapping to 
hertz with a relative change from a baseline pitch.

  scale(0-00) -> scale(3-00) -> scale(4-03) middle C.

> Both SSML1.0 and SSML1.1 include this feature, and I
> can see how fine-grain control of the average voice pitch (e.g. using
> sliders) could be useful for persons who have hearing problems (hearing
> aid devices often amplify audio *and* shift frequencies to favor a
> particular area of the spectrum).

Is this shifting done by arriving at the same relative loudness 
(considering C3 to C5 are loud for the average person without a hearing 
impairment)? I have seen that newer hearing aid devices can use wireless 
connection so this would also have to be queried somehow. Also some 
hearing aid devices (like my fathers') just amplify the sound. There is 
no shifting of pitch.

> Now, I must admit that if we wanted to strictly conform to SSML1.0/1.1,
> the percentage should be an offset (signed relative
> increment/decrement), not a factor. This would avoid unnecessary
> numerical gymnastic when converting from one notation to the other.
> Any thoughts?
> Dan

Why have semitone steps when a baseline pitch may just land in the wrong 
place. What does 'gets multiplied by 0.5' and 'half' mean below for 

   | Only non-negative percentage values are allowed.
   | Computed values are calculated relative to the
   | inherited value. For example, 50% means that the
   | inherited value gets multiplied by 0.5, which
   | results in half the inherited pitch of the voice.

Is this half an octave lower or a full octave lower? I presume you mean 
the later considering hertz is being used. If this is so, then the spec 
needs to indicate that percentage values is relative to hertz.

Though this thought belongs to another thread but can emphasis be done 
by harmonics?

> http://www.w3.org/TR/speech-synthesis11/#edef_prosody
> http://www.w3.org/TR/speech-synthesis/#edef_prosody
> http://dev.w3.org/csswg/css3-speech/#voice-pitch

Alan Gresley
Received on Thursday, 7 July 2011 14:55:58 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:38:47 UTC