- From: Olli Pettay <Olli.Pettay@helsinki.fi>
- Date: Thu, 03 Nov 2011 16:11:33 +0200
- To: Bjorn Bringert <bringert@google.com>
- CC: Dominic Mazzoni <dmazzoni@google.com>, public-xg-htmlspeech@w3.org
On 11/03/2011 04:03 PM, Bjorn Bringert wrote: > On Thu, Nov 3, 2011 at 6:22 AM, Dominic Mazzoni<dmazzoni@google.com> wrote: >> Hello, >> My apologies for not joining this conversation sooner. I'm a Google Chrome >> developer working on accessibility, and I recently helped to author >> the Chrome TTS extension API that we launched a couple of months ago. If you >> haven't already seen it, check out the docs here >> - http://code.google.com/chrome/extensions/tts.html - this has been live in >> Chrome since version 14 and there are a number of talking extensions and >> voices in the Chrome web store now. I'd very much like for this extension >> API to be compatible with the proposed HTML TTS API, and in fact I'm hoping >> to help implement it in Chrome and for the two APIs to share a lot of code. >> Here are some comments and questions on the draft. > >> For TTS, I don't understand where the content to be spoken is supposed to go >> if it's not specified in the inner HTML. Are the only options to use >> <tts>Hello, world</tts> which inserts undesired text in older browsers, or >> <tts src="text.ssml"/>, which forces me to put the text in a separate >> document or use a cumbersome data url? The previous draft from a year ago >> that I had looked at previously had a value attribute, so I could write<tts >> value="Hello, world"/> - why was that deprecated? > > I sent the same comment to this list yesterday. We should have value > and lang attributes to allow simple synthesis use cases. (Not very surprising comment from me:) lang should be obviously just a hint for the UA/speech services. Web page should not be able to query supported languages without user permission. > > >> The spec for both reco and TTS now allow the user to specify a service URL. >> Could you clarify what the value would be if the developer wishes to use a >> local (client-side) engine, if available? Some of the spec seems to assume a >> network speech implementation, but client-side reco and TTS are very much >> possible and quite desirable for applications that require extremely low >> latency, like accessibility in particular. Is there any possibility the spec >> could specify how a user agent could choose to offer local TTS and reco, or >> to give the user or developer a choice of multiple TTS or reco engines, >> which might be local or remote? > > Since the web app rely on any particular client-side engine to be > installed, there is no explicit way to ask for a client-side engine. > However, if the app doesn't specify a service at all, it's up to the > user-agent to select one. This could be a client-side engine, if one > is available. If the user agent wants, it can have a setting that lets > the user pick an engine to use as the default. > > >> Note that the Chrome TTS extension API has a way for the client to query the >> list of possible voices and present the choice to the user or choose one >> based on its characteristics. We've implemented support for OS-native TTS, >> native client TTS, pure-javascript TTS (yes, it really works!), and >> server-based TTS. >> I think it's particularly important that whenever possible the user, not the >> developer, should get to choose the TTS engine and voice. For accessibility, >> visually-impaired users often prefer voices that can be sped up to >> incredible speeds of 2 - 3x normal, and low latency is also extremely >> important. Other users might only want to hear speech if the voice is >> incredibly realistic, and latency may not matter to them. Still others might >> prefer a voice that speaks with a particular accent - all male English >> voices are not interchangable! Android is a great example of what can happen >> when users can choose the TTS engine independently - there are dozens of >> third-party voices available supporting lots of languages, at a variety of >> prices. All of the voices are compatible with any Android app that uses the >> system TTS API, including screen readers, driving direction apps, book >> readers, and more. Right now the proposed spec implies that it's up to the >> developer to choose an appropriate engine, but ideally that'd be the >> exception rather than the rule - ideally the developer would just leave this >> absent and the user agent would select the most appropriate speech engine >> based on the language, user preferences, status of the network, etc. > > What you describe is how the API is designed to work. App-selected > services are for developers who have special needs. Simple speech apps > use the default user-agent engine, which the user agent can allow the > user to select. > > >> An earlier draft had the ability to set lastMark, but now it looks like it's >> read-only, is that correct? That actually may be easier to implement, >> because many speech engines don't support seeking to the middle of a speech >> stream without first synthesizing the whole thing. > > Yes, I think that lastMark is intentionally read-only. > > >> When I posted the initial version of the TTS extension API on the >> chromium-extensions list, the primary feature request I got from developers >> was the ability to get sentence, word, and even phoneme-level callbacks, so >> that got added to the API before we launched it. Having callbacks at ssml >> markers is great, but many applications require synchronizing closely with >> the speech, and it seems really cumbersome and wasteful to have to add an >> ssml mark between every word in the source document, when what the client >> really wants is just constant notification at the finest level of detail >> available. Any chance you could add a way to request more frequent >> callbacks? > > Sounds reasonable. Some other people have brought that up in the past IIRC. > > >
Received on Thursday, 3 November 2011 14:12:15 UTC