- From: Alan Gresley <alan@css-class.com>
- Date: Tue, 02 Aug 2011 19:16:19 +1000
- To: fantasai <fantasai.lists@inkedblade.net>
- CC: Daniel Weck <daniel.weck@gmail.com>, www style <www-style@w3.org>
On 2/08/2011 3:20 AM, fantasai wrote:
> On 08/01/2011 09:40 AM, Daniel Weck wrote:
>>
>> On 20 Jul 2011, at 23:00, fantasai wrote:
>>
>>> On 07/06/2011 12:54 PM, Daniel Weck wrote:
>>>> Please have a look at the updated prose:
>>>>
>>>> http://dev.w3.org/csswg/css3-speech/#voice-props-voice-family
>>>
>>> I think my concern here is that using numerical ages gives a level of
>>> precision in specifying that is nowhere near the level of precision
>>> in voice matching. For example, at what numerical age does a male voice
>>> break?
>>>
>>> I think for this level it might make sense to revert back to keywords
>>> (which we can define as a specific numeric age for mapping to SSML),
>>> and introduce more fine-grained control later when the voice-matching
>>> algorithm is precise enough to support that.
>>
>> I agree that we should avoid using prose that appears to claim a level
>> of precision that we are effectively unable to provide. I propose the
>> following prose instead:
>>
>> ---
>> Possible values are 'child', 'young' and 'old', indicating the preferred
>> age category to match during voice selection. The mapping with SSML ages
>> is defined as follows: 'child' = up to 15 y/o, 'young' = between 16 and
>> 45 y/o, 'old' = 46 y/o onwards.
>> NOTE: The interpretation of the relationship between a person's age and
>> a recognizable type of voice cannot realistically be defined in a
>> universal
>> manner, as it effectively depends on numerous cultural and linguistic
>> variations. The values provided by this specification therefore represent
>> a simplified model that can be reasonably applied to a great variety of
>> speech locales, albeit at the cost of a certain degree of approximation.
>> Future versions of this specification may refine the level of precision
>> of the voice-matching algorithm, as speech processor implementations
>> become more standardized.
>> ---
>
> How about just mapping the keywords to specific numbers, and letting the
> voice-matching algorithm figure out the slack?
> 'child' = 6 years old
> 'young' = 24 years old
> 'old' = 75 years old
> or somesuch
>
> ~fantasai
It's not that easy. A child voice generally has a higher frequency and
someone with a hearing impairment may not be able to hear high
frequencies as well as they do, lower frequencies.
This is something I mentioned just today in another list message [1]
where I write.
| What is needed is something that plays sound at ever
| increasing levels until a level is reach that is
| desirable. This would have to be done over different
| octaves.
Please take a look at this article.
http://en.wikipedia.org/wiki/Piano_key_frequencies
You would begin at A0 (27.5Hz) and end with A7 (3520Hz) and test the
full range by octaves.
By user feedback, a dynamic range of hearing levels can be established
for each octave.
A person of good hearing may have this dynamic range like so:
High A3 A4
A2 A5
A1 A6
Low A0 A7
A person of poor hearing may have this dynamic range like so:
High
A1 A2 A3
Low A0 A4
A5
A6
A7
Note that in the later examples, A5, A6 and A7 are below low. In affect
these frequencies are not audible but a simple hearing aid may extend
this range. My father can not hear D5 but with a hearing aid, this is
extended by more than an octave higher. Even with a hearing aid, a note
like D7 played on a piano is harder to hear since the sounding of the
key going down is louder than the note produced.
After a user does a sound check (I'm very serious here) over a range of
frequencies from A0 (27.5Hz) to A7 (3520Hz), equalization can be done
for various voice pitches in a dynamic range similar to that of someone
with good hearing.
[1] http://lists.w3.org/Archives/Public/www-style/2011Aug/0034.html
--
Alan Gresley
http://css-3d.org/
http://css-class.com/
Received on Tuesday, 2 August 2011 09:16:47 UTC