- From: Doug Schepers <doug@schepers.cc>
- Date: Wed, 7 Apr 2004 21:26:49 -0400
- To: <turner@atr.net>, "RJ Auburn" <rj@voxeo.com>
- Cc: <www-voice@w3.org>
Hi, Turner- I happen to be a student of Linguistics at the University of North Carolina-Chapel Hill (note, however, that I'm more interested in syntax and cogntive linguistics than phonetics or phonemics). Recently a "quiet room" was built there, and I may have access to it over the summer. There may or may not be recording equipment suitable for the task there. I do know people with some prosumer recording equipment which may also be suitable, or we could just use a nice PC and Praat. ;) I would have to confirm access to these resources, but I will look into it. I have a fairly pleasant speaking voice, and I'd be willing to devote my time to a project such as this. I may also be able to recruit voice talent among some singer or actor friends of mine, but certainly no guarantees there. It would be nice to have a female voice, too. I suppose that an advantage to my doing it would be that I would definitely not charge for my time, and I'd be available again for resampling as needed. Caveat: I'm not a professional announcer. As you said, the real question is: what would we need to record? I'm farily clueless here, being a complete newbie to the subject. All the consonants and vowels in various environments, like stop-initial/stop-final, glide-intial, etc.? Maybe there's a comprehensive list of the sounds needed... that would be a great help. I suspect that I am not qualified to head up a project like this, but I would be willing to follow someone else's lead. Hopefully it could leverage existing Open Source software, such as Festival (if that's compatible to non-formant TTS) or other packages. While my own desires are for a C/C++ solution, there's nothing to stop someone from doing a Java port as well, since that is a good cross-platform language. Another consideration is mobile devices. I think that any solution should consider how it might be used on smaller devices, whether the code is clientside or served wirelessly, or some clever combination of the two that optimizes the bandwidth/local processor/local storage triangle. Obviously, such a plugin should be as standards-driven as possible. SSML 1.0 and VoiceXML 2.0 are new recommendations, right? Who knows, maybe we could even use those, somehow. ;) Regards- -Doug the Dreamer Turner Rentz , III wrote: | | The TTS problem roughly divides up into formant vs. non formant. | Formant speech is generally free. | | The barrier to Synthesized non-formant TTS is basically the studio | time of the voice talent required to produce it, plus software | development. | | Hosting is definitely out of the question. Licenses are | sold on a per port basis, so you'd have to license | a big port number just to be free for everyone else | then quality of service would drop because we're public | internet UDP. | | | We should go for our own TTS development here | if we're going for open source, and just get one of us | with a nice voice to do the studio time. | | | Question: What is the minimum time in the studio we need to be able | to accurately reproduce the major elements of prosody? | | (my guess is about 40 hours. how do you arrive at numbers like this?) |
Received on Wednesday, 7 April 2004 21:27:00 UTC