Re: UA <=> SS Protocol from Bjorn Bringert on 2010-12-09 (public-xg-htmlspeech@w3.org from December 2010)

From: Bjorn Bringert <bringert@google.com>
Date: Thu, 9 Dec 2010 10:21:28 +0000
To: Marc Schroeder <marc.schroeder@dfki.de>
Cc: "Young, Milan" <Milan.Young@nuance.com>, Satish Sampath <satish@google.com>, Robert Brown <Robert.Brown@microsoft.com>, Dave Burke <daveburke@google.com>, public-xg-htmlspeech@w3.org
Message-ID: <AANLkTinKBMrEyaSf1xuV5hLx_mL4BPUeHUmroJ-9NOvd@mail.gmail.com>

On Thu, Dec 9, 2010 at 7:56 AM, Marc Schroeder <marc.schroeder@dfki.de> wrote:
> Hi Bjorn,
>
> On 07.12.10 21:11, Bjorn Bringert wrote:
>>
>> The
>> things that I hope the XG will deliver are:
>>
>> 1. A draft spec of a web app API for using a speech recognizer
>> provider by the browser, with implementations in several browsers.
>>
>> 2. A draft spec of a web app API for using a speech synthesizer
>> provider by the browser, with implementations in several browsers.
>>
>> 3. Requirements and change requests to other working groups or
>> incubator groups to make sure that APIs such as Device, Audio and
>> XmlHttpRequest work for network speech services. This is completely
>> independent of 1 and 2. To ensure that the requested features are
>> sufficient, there should be several demo systems using those APIs for
>> speech.
>
>
> I may be misunderstanding you, but to my mind there is an important link
> missing between your items 1+2 and 3: how to make network speech services
> work via *the same API* as the browser's default speech service?
>
> We have pointed out requirements which indicate that we want to allow this:
>
> - FPR7. Web apps should be able to request speech service different from
> default.
>
> - FPR12. Speech services that can be specified by web apps must include
> network speech services.
>
> Now let's assume for the moment we would go for a <tts> element like you
> suggested, which extends HTMLMediaElement. With your items 1-3, how as a web
> app author would I use that <tts> element and tell it to get its speech from
> a TTS engine on the network? In other words, in order for the web app to use
> a networked speech service rather than the built-in one, most of the markup
> / scripts should stay the same, and only the reference to the speech service
> should have to change.
>
> I imagine the browser will have to facilitate this in some way, which would
> mean that we are *not* talking about a protocol just between the web app and
> the speech service... any thoughts?

Thanks Marc, I forgot to say this in my previous e-mail: Given
Satish's proposal, I think that we should drop the idea that the same
API is used for both browser-provided and web-app specified speech
services. While it would be nice to have a single API, I think that it
would be better to separate the two completely, for the following
reasons:

1. Using generic audio and network APIs for network speech services
allows web apps and speech services much more flexibility in defining
their protocol. This is still an immature area, and flexibility to
experiment with different high-level protocols is good.

2. It reduces the implementation burden on browsers, since they will
only have to implement a simpler API for default speech services +
generic audio and network APIs that they would likely implement
anyway.

3. It reduces the specification burden on the XG, and keeps the work
of the XG clearly within the scope defined in the charter (as Dave
pointed out, new protocols really belong in the IETF).

The only disadvantage that I can see is that it will be harder to port
web apps between network speech services from different vendors.
Solving that problem would be nice, but I think that it is premature
to do that before we have some experience with real-world web apps in
browsers and the requirements that they place on a standard high-level
protocol.

-- 
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
Palace Road, London, SW1W 9TQ
Registered in England Number: 3977902

Received on Thursday, 9 December 2010 10:21:59 UTC