Re: Speech synthesis events from Aaron Brewer on 2015-10-01 (public-speech-api@w3.org from October 2015)

From: Aaron Brewer <spaceribs@gmail.com>
Date: Thu, 1 Oct 2015 16:09:55 -0400
To: public-speech-api@w3.org
Message-Id: <B71F6DF6-882F-4527-9FFD-8A3C70F9E046@gmail.com>

Hey Dominic,

Totally makes sense, and I’m not incredibly surprised that speech events are so random. In my own tests I had to completely turn off remote voices used by Chrome because onboundary doesn’t work at all for them (although there doesn’t appear to be a good way to test for this support yet). At the moment you can only guess how long a boundary event takes, or a spoken word lasts, and thats pretty unfortunate.

-Aaron

> On Oct 1, 2015, at 3:05 PM, Dominic Mazzoni <dmazzoni@google.com> wrote:
> 
> Aaron,
> 
> The web speech API is implementation-independent. Each speech engine has its own concept of a phoneme, and many of them don't actually expose any information about phoneme boundaries or phoneme events. Similarly for "word", each engine breaks up text into words differently, for example some engines consider a hyphenated word like implementation-independent to be two words, others consider it to be one word. Also, some fire events at the start of a word, some at the end of a word, and some both. In order to abstract over those differences, the web speech API gives you a single boundary event.
> 
> It makes sense to support boundary events for phonemes, when the speech engine supports it.
> 
> - Dominic
> 
> 
> 
> On Thu, Oct 1, 2015 at 12:22 AM Aaron Brewer <spaceribs@gmail.com <mailto:spaceribs@gmail.com>> wrote:
> Hi Everyone,
> 
> I played around with the speech synthesis API implementation in Chrome last night, it’s pretty great stuff. The lack of SSML support is troubling and I hope that gets resolved soon, and it seems like getVoices() is handled differently in every browser I’ve tested, but it’s getting there.
> 
> I wanted to put in a request for a new event called “onphoneme” and/or “onword”. It would be excellent to have this feature for lip-syncing as hacking “onboundary” can only get you so far. The data structure I imagine that would be passed for these events could be modeled from phonemenon (https://github.com/jimkang/phonemenon <https://github.com/jimkang/phonemenon>) which gives the phoneme or array of phonemes and their stress level. I know we’re still in the early days of this API, but I feel like having an “onboundary” event and not having an event for when words themselves are spoken is counterintuitive.
> 
> - Aaron

Received on Thursday, 1 October 2015 20:10:31 UTC