W3C home > Mailing lists > Public > public-xg-htmlspeech@w3.org > December 2010

Re: UA <=> SS Protocol

From: Marc Schroeder <marc.schroeder@dfki.de>
Date: Thu, 09 Dec 2010 08:56:22 +0100
Message-ID: <4D008BA6.5090602@dfki.de>
To: Bjorn Bringert <bringert@google.com>
CC: "Young, Milan" <Milan.Young@nuance.com>, Satish Sampath <satish@google.com>, Robert Brown <Robert.Brown@microsoft.com>, Dave Burke <daveburke@google.com>, public-xg-htmlspeech@w3.org
Hi Bjorn,

On 07.12.10 21:11, Bjorn Bringert wrote:
> The
> things that I hope the XG will deliver are:
> 1. A draft spec of a web app API for using a speech recognizer
> provider by the browser, with implementations in several browsers.
> 2. A draft spec of a web app API for using a speech synthesizer
> provider by the browser, with implementations in several browsers.
> 3. Requirements and change requests to other working groups or
> incubator groups to make sure that APIs such as Device, Audio and
> XmlHttpRequest work for network speech services. This is completely
> independent of 1 and 2. To ensure that the requested features are
> sufficient, there should be several demo systems using those APIs for
> speech.

I may be misunderstanding you, but to my mind there is an important link 
missing between your items 1+2 and 3: how to make network speech 
services work via *the same API* as the browser's default speech service?

We have pointed out requirements which indicate that we want to allow this:

- FPR7. Web apps should be able to request speech service different from 

- FPR12. Speech services that can be specified by web apps must include 
network speech services.

Now let's assume for the moment we would go for a <tts> element like you 
suggested, which extends HTMLMediaElement. With your items 1-3, how as a 
web app author would I use that <tts> element and tell it to get its 
speech from a TTS engine on the network? In other words, in order for 
the web app to use a networked speech service rather than the built-in 
one, most of the markup / scripts should stay the same, and only the 
reference to the speech service should have to change.

I imagine the browser will have to facilitate this in some way, which 
would mean that we are *not* talking about a protocol just between the 
web app and the speech service... any thoughts?


Dr. Marc Schröder, Senior Researcher at DFKI GmbH
Coordinator EU FP7 Project SEMAINE http://www.semaine-project.eu
Project leader for DFKI in SSPNet http://sspnet.eu
Project leader PAVOQUE http://mary.dfki.de/pavoque
Associate Editor IEEE Trans. Affective Computing http://computer.org/tac
Editor W3C EmotionML Working Draft http://www.w3.org/TR/emotionml/
Portal Editor http://emotion-research.net
Team Leader DFKI TTS Group http://mary.dfki.de

Homepage: http://www.dfki.de/~schroed
Email: marc.schroeder@dfki.de
Phone: +49-681-85775-5303
Postal address: DFKI GmbH, Campus D3_2, Stuhlsatzenhausweg 3, D-66123 
Saarbrücken, Germany
Official DFKI coordinates:
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Trippstadter Strasse 122, D-67663 Kaiserslautern, Germany
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
Received on Thursday, 9 December 2010 07:56:57 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:16:48 UTC