W3C home > Mailing lists > Public > www-style@w3.org > August 2011

Re: [css3-speech] voice-family

From: Alan Gresley <alan@css-class.com>
Date: Tue, 02 Aug 2011 19:16:19 +1000
Message-ID: <4E37C063.4070204@css-class.com>
To: fantasai <fantasai.lists@inkedblade.net>
CC: Daniel Weck <daniel.weck@gmail.com>, www style <www-style@w3.org>
On 2/08/2011 3:20 AM, fantasai wrote:
> On 08/01/2011 09:40 AM, Daniel Weck wrote:
>> On 20 Jul 2011, at 23:00, fantasai wrote:
>>> On 07/06/2011 12:54 PM, Daniel Weck wrote:
>>>> Please have a look at the updated prose:
>>>> http://dev.w3.org/csswg/css3-speech/#voice-props-voice-family
>>> I think my concern here is that using numerical ages gives a level of
>>> precision in specifying that is nowhere near the level of precision
>>> in voice matching. For example, at what numerical age does a male voice
>>> break?
>>> I think for this level it might make sense to revert back to keywords
>>> (which we can define as a specific numeric age for mapping to SSML),
>>> and introduce more fine-grained control later when the voice-matching
>>> algorithm is precise enough to support that.
>> I agree that we should avoid using prose that appears to claim a level
>> of precision that we are effectively unable to provide. I propose the
>> following prose instead:
>> ---
>> Possible values are 'child', 'young' and 'old', indicating the preferred
>> age category to match during voice selection. The mapping with SSML ages
>> is defined as follows: 'child' = up to 15 y/o, 'young' = between 16 and
>> 45 y/o, 'old' = 46 y/o onwards.
>> NOTE: The interpretation of the relationship between a person's age and
>> a recognizable type of voice cannot realistically be defined in a
>> universal
>> manner, as it effectively depends on numerous cultural and linguistic
>> variations. The values provided by this specification therefore represent
>> a simplified model that can be reasonably applied to a great variety of
>> speech locales, albeit at the cost of a certain degree of approximation.
>> Future versions of this specification may refine the level of precision
>> of the voice-matching algorithm, as speech processor implementations
>> become more standardized.
>> ---
> How about just mapping the keywords to specific numbers, and letting the
> voice-matching algorithm figure out the slack?
> 'child' = 6 years old
> 'young' = 24 years old
> 'old' = 75 years old
> or somesuch
> ~fantasai

It's not that easy. A child voice generally has a higher frequency and 
someone with a hearing impairment may not be able to hear high 
frequencies as well as they do, lower frequencies.

This is something I mentioned just today in another list message [1] 
where I write.

   | What is needed is something that plays sound at ever
   | increasing levels until a level is reach that is
   | desirable. This would have to be done over different
   | octaves.

Please take a look at this article.


You would begin at A0 (27.5Hz) and end with A7 (3520Hz) and test the 
full range by octaves.

By user feedback, a dynamic range of hearing levels can be established 
for each octave.

A person of good hearing may have this dynamic range like so:

High                 A3       A4
               A2                     A5
          A1                               A6
Low  A0                                       A7

A person of poor hearing may have this dynamic range like so:


          A1   A2     A3
Low  A0                      A4

Note that in the later examples, A5, A6 and A7 are below low. In affect 
these frequencies are not audible but a simple hearing aid may extend 
this range. My father can not hear D5 but with a hearing aid, this is 
extended by more than an octave higher. Even with a hearing aid, a note 
like D7 played on a piano is harder to hear since the sounding of the 
key going down is louder than the note produced.

After a user does a sound check (I'm very serious here) over a range of 
frequencies from A0 (27.5Hz) to A7 (3520Hz), equalization can be done 
for various voice pitches in a dynamic range similar to that of someone 
with good hearing.

[1] http://lists.w3.org/Archives/Public/www-style/2011Aug/0034.html

Alan Gresley
Received on Tuesday, 2 August 2011 09:16:47 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 11 February 2015 12:34:56 UTC