RE: Speech API Community Group from Michael Bodell on 2012-04-03 (public-webapps@w3.org from April to June 2012)

From: Michael Bodell <mbodell@microsoft.com>
Date: Tue, 3 Apr 2012 20:08:09 +0000
To: Jerry Carter <jerry@jerrycarter.org>, "Raj (Openstream)" <raj@openstream.com>, Milan Young <Milan.Young@nuance.com>, Jim <Jim@haynes-barnett.net>
CC: Glen Shires <gshires@google.com>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>, "public-webapps@w3.org" <public-webapps@w3.org>
Message-ID: <22CD592CCD76414085591204EB19F4E830D8202A@TK5EX14MBXC262.redmond.corp.microsoft.>
A little bit of historical context and resource references might be helpful for some on the email thread.

While this is still an early stage for a community group, if one will happen, it actually isn’t early for the community as a group to talk about this.  In many ways we’ve already done the initial incubation and community discussion and investigation for this space in the HTML Speech XG.  This lead to the XG’s use case and requirements document:
http://www.w3.org/2005/Incubator/htmlspeech/live/requirements.html


which were then refined to a prioritized requirement list after soliciting community input:
http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech-20111206/#prioritized


As I read it, Milan and Jim and Raj’s requirements discussed are part of FPR7 [Web apps should be able to request speech service different from default] and FPR12 [Speech services that can be specified by web apps must include network speech services], both of which were voted to have “Strong Interest” by the community.

Further work from these requirements led to the community coming up with a proposal, which is ready now to be taken to a standards track process, that was published in the XG final report:
http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech-20111206/


Hopefully we can all properly leverage the work the community has already done.

Michael Bodell
Co-chair HTML Speech XG


From: Jerry Carter [mailto:jerry@jerrycarter.org]
Sent: Tuesday, April 03, 2012 12:50 PM
To: Raj (Openstream); Milan Young; Jim
Cc: Glen Shires; public-xg-htmlspeech@w3.org; public-webapps@w3.org
Subject: Re: Speech API Community Group


We can discuss this in terms of generalities without any resolution, so let me offer two more concrete use cases:

My friend Jóse is working on a personal site to track teams and player statistics at the Brazil 2014 World Cup.  He recognizes that the browser will define a default language through the HTTP Accept-Language header, but knows that speakers may code switch in their requests (e.g. Spanish + English or Portuguese + English or ) or be better served by using native pronunciations (Jesus = /heːzus/ vs. /ˈdʒiːzəs/).  Hence, he requires a resource that can provide support for Spanish, English, and Portuguese and that can also support multiple simultaneous languages.

These are two solid requirements.  A browser encountering the page might (1) be able to satisfy these requirements, (2) require user permission before accessing such a resource, or (3) be unable to meet the request.

My colleague Jim has another application for which hundreds of hours have been invested to optimize the performance for a specify recognition resource.  Security considerations further restrict the physical location of conforming resources.  His page requires a very specific resource.

These are two solid requirements.  A browser encountering the page might (1) be able to satisfy these requirements, (2) require user permission before accessing such a resource, or (3) be unable to meet the request.

There are indeed commercial requirements around the capabilities of resources.  We are in full agreement.  It is important to be able to list requirements for conforming resources and to ensure that the browser is enforcing those requirements.  That stated, the application author does no care where such a conforming resource resides so long as it is available to the targeted user population.  The user does not care where the resource resides so long as it works well and does not cost too much to use.

The trick within a Speech JavaScript API is to define what characteristics may be specified for resource selection or, alternatively, to determine that such definition is external to the immediate API: for instance,  there might be a separate spec which is referenced by the Speech JavaScript API.  It is too early to tell what direction the group might go.  It is already clear that there are strong opinions as to what criteria may be necessary for resource selection.  Refusing to participate unless one's specific criteria are addressed strikes me as quite inappropriate at this early stage.

-=- Jerry



On Apr 3, 2012, at 3:15 PM, Raj (Openstream) wrote:



Perhaps true for users of the applicaitons. But, Authors would need Resource-specification(location),
hence clearly specifying how network/local services can be used ( even if protocols are out of scope)
, outside of browser-defaults will be of interest to many including Openstream.

Raj



On Tue, 3 Apr 2012 14:45:45 -0400
Jerry Carter <jerry@jerrycarter.org<mailto:jerry@jerrycarter.org>> wrote:

On Apr 3, 2012, at 11:48 AM, Young, Milan wrote:
The proposal mentions that the specification of a network speech protocol is out of scope. This makes sense given that protocols are the domain of the IETF.
But I’d like to confirm that the use of network speech services are in scope for this CG.  Would you mind amending the proposal to make this explicit?
I don't see why any such declaration is necessary.  From the perspective of the application author or of the application user, it matters very little where the speech-to-text operation occurs so long as the result is delivered promptly.  There is no reason that local, network-based, or hybrid solutions would be unable to provide adequate performance.  I believe the current language in the proposal is appropriate.
-=- Jerry

--
NOTICE TO RECIPIENT:  THIS E-MAIL IS  MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGED BY LAW.  IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS E-MAIL IS STRICTLY PROHIBITED.  PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM. THANK YOU IN ADVANCE FOR YOUR COOPERATION. Reply to : legal@openstream.com<mailto:legal@openstream.com>
Received on Tuesday, 3 April 2012 20:08:53 UTC