- From: Alan Gresley <alan@css-class.com>
- Date: Tue, 02 Aug 2011 19:16:19 +1000
- To: fantasai <fantasai.lists@inkedblade.net>
- CC: Daniel Weck <daniel.weck@gmail.com>, www style <www-style@w3.org>
On 2/08/2011 3:20 AM, fantasai wrote: > On 08/01/2011 09:40 AM, Daniel Weck wrote: >> >> On 20 Jul 2011, at 23:00, fantasai wrote: >> >>> On 07/06/2011 12:54 PM, Daniel Weck wrote: >>>> Please have a look at the updated prose: >>>> >>>> http://dev.w3.org/csswg/css3-speech/#voice-props-voice-family >>> >>> I think my concern here is that using numerical ages gives a level of >>> precision in specifying that is nowhere near the level of precision >>> in voice matching. For example, at what numerical age does a male voice >>> break? >>> >>> I think for this level it might make sense to revert back to keywords >>> (which we can define as a specific numeric age for mapping to SSML), >>> and introduce more fine-grained control later when the voice-matching >>> algorithm is precise enough to support that. >> >> I agree that we should avoid using prose that appears to claim a level >> of precision that we are effectively unable to provide. I propose the >> following prose instead: >> >> --- >> Possible values are 'child', 'young' and 'old', indicating the preferred >> age category to match during voice selection. The mapping with SSML ages >> is defined as follows: 'child' = up to 15 y/o, 'young' = between 16 and >> 45 y/o, 'old' = 46 y/o onwards. >> NOTE: The interpretation of the relationship between a person's age and >> a recognizable type of voice cannot realistically be defined in a >> universal >> manner, as it effectively depends on numerous cultural and linguistic >> variations. The values provided by this specification therefore represent >> a simplified model that can be reasonably applied to a great variety of >> speech locales, albeit at the cost of a certain degree of approximation. >> Future versions of this specification may refine the level of precision >> of the voice-matching algorithm, as speech processor implementations >> become more standardized. >> --- > > How about just mapping the keywords to specific numbers, and letting the > voice-matching algorithm figure out the slack? > 'child' = 6 years old > 'young' = 24 years old > 'old' = 75 years old > or somesuch > > ~fantasai It's not that easy. A child voice generally has a higher frequency and someone with a hearing impairment may not be able to hear high frequencies as well as they do, lower frequencies. This is something I mentioned just today in another list message [1] where I write. | What is needed is something that plays sound at ever | increasing levels until a level is reach that is | desirable. This would have to be done over different | octaves. Please take a look at this article. http://en.wikipedia.org/wiki/Piano_key_frequencies You would begin at A0 (27.5Hz) and end with A7 (3520Hz) and test the full range by octaves. By user feedback, a dynamic range of hearing levels can be established for each octave. A person of good hearing may have this dynamic range like so: High A3 A4 A2 A5 A1 A6 Low A0 A7 A person of poor hearing may have this dynamic range like so: High A1 A2 A3 Low A0 A4 A5 A6 A7 Note that in the later examples, A5, A6 and A7 are below low. In affect these frequencies are not audible but a simple hearing aid may extend this range. My father can not hear D5 but with a hearing aid, this is extended by more than an octave higher. Even with a hearing aid, a note like D7 played on a piano is harder to hear since the sounding of the key going down is louder than the note produced. After a user does a sound check (I'm very serious here) over a range of frequencies from A0 (27.5Hz) to A7 (3520Hz), equalization can be done for various voice pitches in a dynamic range similar to that of someone with good hearing. [1] http://lists.w3.org/Archives/Public/www-style/2011Aug/0034.html -- Alan Gresley http://css-3d.org/ http://css-class.com/
Received on Tuesday, 2 August 2011 09:16:47 UTC