- From: Bjorn Bringert <bringert@google.com>
- Date: Thu, 9 Dec 2010 10:21:28 +0000
- To: Marc Schroeder <marc.schroeder@dfki.de>
- Cc: "Young, Milan" <Milan.Young@nuance.com>, Satish Sampath <satish@google.com>, Robert Brown <Robert.Brown@microsoft.com>, Dave Burke <daveburke@google.com>, public-xg-htmlspeech@w3.org
On Thu, Dec 9, 2010 at 7:56 AM, Marc Schroeder <marc.schroeder@dfki.de> wrote: > Hi Bjorn, > > On 07.12.10 21:11, Bjorn Bringert wrote: >> >> The >> things that I hope the XG will deliver are: >> >> 1. A draft spec of a web app API for using a speech recognizer >> provider by the browser, with implementations in several browsers. >> >> 2. A draft spec of a web app API for using a speech synthesizer >> provider by the browser, with implementations in several browsers. >> >> 3. Requirements and change requests to other working groups or >> incubator groups to make sure that APIs such as Device, Audio and >> XmlHttpRequest work for network speech services. This is completely >> independent of 1 and 2. To ensure that the requested features are >> sufficient, there should be several demo systems using those APIs for >> speech. > > > I may be misunderstanding you, but to my mind there is an important link > missing between your items 1+2 and 3: how to make network speech services > work via *the same API* as the browser's default speech service? > > We have pointed out requirements which indicate that we want to allow this: > > - FPR7. Web apps should be able to request speech service different from > default. > > - FPR12. Speech services that can be specified by web apps must include > network speech services. > > Now let's assume for the moment we would go for a <tts> element like you > suggested, which extends HTMLMediaElement. With your items 1-3, how as a web > app author would I use that <tts> element and tell it to get its speech from > a TTS engine on the network? In other words, in order for the web app to use > a networked speech service rather than the built-in one, most of the markup > / scripts should stay the same, and only the reference to the speech service > should have to change. > > I imagine the browser will have to facilitate this in some way, which would > mean that we are *not* talking about a protocol just between the web app and > the speech service... any thoughts? Thanks Marc, I forgot to say this in my previous e-mail: Given Satish's proposal, I think that we should drop the idea that the same API is used for both browser-provided and web-app specified speech services. While it would be nice to have a single API, I think that it would be better to separate the two completely, for the following reasons: 1. Using generic audio and network APIs for network speech services allows web apps and speech services much more flexibility in defining their protocol. This is still an immature area, and flexibility to experiment with different high-level protocols is good. 2. It reduces the implementation burden on browsers, since they will only have to implement a simpler API for default speech services + generic audio and network APIs that they would likely implement anyway. 3. It reduces the specification burden on the XG, and keeps the work of the XG clearly within the scope defined in the charter (as Dave pointed out, new protocols really belong in the IETF). The only disadvantage that I can see is that it will be harder to port web apps between network speech services from different vendors. Solving that problem would be nice, but I think that it is premature to do that before we have some experience with real-world web apps in browsers and the requirements that they place on a standard high-level protocol. -- Bjorn Bringert Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
Received on Thursday, 9 December 2010 10:21:59 UTC