Speech Recognition and Text-to-Speech Javascript API - seeking feedback for eventual standardization

As Dan Burnett wrote below: The HTML Speech Incubator Group [1] has
recently wrapped up its work on use cases, requirements, and proposals for
adding automatic speech recognition (ASR) and text-to-speech (TTS)
capabilities to HTML.  The work of the group is documented in the group's
Final Report. [2]  The members of the group intend this work to be input to
one or more working groups, in W3C and/or other standards development
organizations such as the IETF, as an aid to developing full standards in
this space.

Because that work was so broad, Art Barstow asked (below) for a relatively
specific proposal.  We at Google are proposing that a subset of it be
accepted as a work item by the Web Applications WG.  Specifically, we are
proposing this Javascript API [3], which enables web developers to
incorporate speech recognition and synthesis into their web pages.
This simplified subset enables developers to use scripting to generate
text-to-speech output and to use speech recognition as an input for forms,
continuous dictation and control, and it supports the majority of use-cases
in the Incubator Group's Final Report.

We welcome your feedback and ask that the Web Applications WG
consider accepting this Javascript API [3] as a work item.

[1] charter:  http://www.w3.org/2005/Incubator/htmlspeech/charter
[2] report: http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/
[3] API:
http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/speechapi.html

Bjorn Bringert
Satish Sampath
Glen Shires

On Thu, Dec 22, 2011 at 11:38 AM, Glen Shires <gshires@google.com> wrote:

> Milan,
> The IDLs contained in both documents are in the same format and order, so
> it's relatively easy to compare the two side<http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech-20111206/#speechreco-section>
> -by-side<http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/speechapi.html#api_description>.
> The semantics of the attributes, methods and events have not changed, and
> both IDLs link directly to the definitions contained in the Speech XG Final
> Report.
>
> As you mention, we agree that the protocol portions of the Speech XG Final
> Report are most appropriate for consideration by a group such as IETF, and
> believe such work can proceed independently, particularly because the
> Speech XG Final Report has provided a roadmap for these to remain
> compatible.  Also, as shown in the Speech XG Final Report - Overview<http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech-20111206/#introductory>,
> the "Speech Web API" is not dependent on the "Speech Protocol" and a
> "Default Speech" service can be used for local or remote speech recognition
> and synthesis.
>
> Glen Shires
>
>
> On Thu, Dec 22, 2011 at 10:32 AM, Young, Milan <Milan.Young@nuance.com>wrote:
>
>> Hello Glen,****
>>
>> ** **
>>
>> The proposal says that it contains a “simplified subset of the JavaScript
>> API”.  Could you please clarify which elements of the HTMLSpeech
>> recommendation’s JavaScript API were omitted?   I think this would be the
>> most efficient way for those of us familiar with the XG recommendation to
>> evaluate the new proposal.****
>>
>> ** **
>>
>> I’d also appreciate clarification on how you see the protocol being
>> handled.  In the HTMLSpeech group we were thinking about this as a
>> hand-in-hand relationship between W3C and IETF like WebSockets.  Is this
>> still your (and Google’s) vision?****
>>
>> ** **
>>
>> Thanks****
>>
>> ** **
>>
>> ** **
>>
>> *From:* Glen Shires [mailto:gshires@google.com]
>> *Sent:* Thursday, December 22, 2011 11:14 AM
>> *To:* public-webapps@w3.org; Arthur Barstow
>> *Cc:* public-xg-htmlspeech@w3.org; Dan Burnett
>>
>> *Subject:* Re: HTML Speech XG Completes, seeks feedback for eventual
>> standardization****
>>
>> ** **
>>
>> We at Google believe that a scripting-only (Javascript) subset of the API
>> defined in the Speech XG Incubator Group Final Report is of appropriate
>> scope for consideration by the WebApps WG.****
>>
>> ** **
>>
>> The enclosed scripting-only subset supports the majority of the use-cases
>> and samples in the XG proposal. Specifically, it enables web-pages to
>> generate speech output and to use speech recognition as an input for forms,
>> continuous dictation and control. The Javascript API will allow web pages
>> to control activation and timing and to handle results and alternatives.*
>> ***
>>
>> ** **
>>
>> We welcome your feedback and ask that the Web Applications WG consider
>> accepting this as a work item.****
>>
>> ** **
>>
>> Bjorn Bringert****
>>
>> Satish Sampath****
>>
>> Glen Shires****
>>
>> ** **
>>
>> On Tue, Dec 13, 2011 at 11:39 AM, Glen Shires <gshires@google.com> wrote:
>> ****
>>
>> We at Google believe that a scripting-only (Javascript) subset of the API
>> defined in the Speech XG Incubator Group Final Report [1] is of appropriate
>> scope for consideration by the WebApps WG.****
>>
>> ** **
>>
>> A scripting-only subset supports the majority of the use-cases and
>> samples in the XG proposal. Specifically, it enables web-pages to generate
>> speech output and to use speech recognition as an input for forms,
>> continuous dictation and control. The Javascript API will allow web pages
>> to control activation and timing and to handle results and alternatives**
>> **
>>
>> ** **
>>
>> As Dan points out above, we envision that different portions of the
>> Incubator Group Final Report are applicable to different working groups "in
>> W3C and/or other standards development organizations such as the IETF".
>>  This scripting API subset does not preclude other groups from pursuing
>> standardization of relevant HTML markup or underlying transport protocols,
>> and indeed the Incubator Group Final Report defines a potential roadmap
>> such that such additions can be compatible.****
>>
>> ** **
>>
>> To make this more concrete, Google will provide to this mailing list a
>> specific proposal extracted from the Incubator Group Final Report, that
>> includes only those portions we believe are relevant to WebApps, with links
>> back to the Incubator Report as appropriate.****
>>
>> ** **
>>
>> Bjorn Bringert****
>>
>> Satish Sampath****
>>
>> Glen Shires****
>>
>> ** **
>>
>> [1] http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/****
>>
>> ** **
>>
>> On Tue, Dec 13, 2011 at 5:32 AM, Dan Burnett <dburnett@voxeo.com> wrote:*
>> ***
>>
>> Thanks for the info, Art.  To be clear, I personally am *NOT* proposing
>> adding any specs to WebApps, although others might.  My email below as a
>> Chair of the group is merely to inform people of this work and ask for
>> feedback.
>> I expect that your information will be useful for others who might wish
>> for some of this work to continue in WebApps.
>>
>> -- dan****
>>
>>
>>
>> On Dec 13, 2011, at 7:06 AM, Arthur Barstow wrote:
>>
>> > Hi Dan,
>> >
>> > WebApps already has a relatively large number of specs in progress (see
>> [PubStatus]) and the group has agreed to add some additional specs (see
>> [CharterChanges]). As such, please provide a relatively specific proposal
>> about the features/specs you and other proponents would like to add to
>> WebApps.
>> >
>> > Regarding the level of detail for your proposal, I think a reasonable
>> precedence is something like the Gamepad and Pointer/MouseLock proposals
>> (see [CharterChanges]). (Perhaps this could be achieved by identifying
>> specific sections in the XG's Final Report?)
>> >
>> > -Art Barstow
>> >
>> > [PubStatus]
>> http://www.w3.org/2008/webapps/wiki/PubStatus#API_Specifications
>> > [CharterChanges]
>> http://www.w3.org/2008/webapps/wiki/CharterChanges#Additions_Agreed
>> >
>> > On 12/12/11 5:25 PM, ext Dan Burnett wrote:
>> >> Dear WebApps people,
>> >>
>> >> The HTML Speech Incubator Group [1] has recently wrapped up its work
>> on use cases, requirements, and proposals for adding automatic speech
>> recognition (ASR) and text-to-speech (TTS) capabilities to HTML.  The work
>> of the group is documented in the group's Final Report. [2]
>> >>
>> >> The members of the group intend this work to be input to one or more
>> working groups, in W3C and/or other standards development organizations
>> such as the IETF, as an aid to developing full standards in this space.
>> >> Whether the W3C work happens in a new Working Group or an existing
>> one, we are interested in collecting feedback on the Incubator Group's
>> work.  We are specifically interested in input from the members of the
>> WebApps Working Group.
>> >>
>> >> If you have any feedback to share, please send it to, or cc, the
>> group's mailing list (public-xg-htmlspeech@w3.org).  This will allow
>> comments to be archived in one consistent location for use by whatever
>> group takes up this work.
>> >>
>> >>
>> >> Dan Burnett, Co-Chair
>> >> HTML Speech Incubator Group
>> >>
>> >>
>> >> [1] charter:  http://www.w3.org/2005/Incubator/htmlspeech/charter
>> >> [2] http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/
>> >>
>> >> p.s.  This feedback request is being sent to the following groups:
>>  WebApps, HTML, Audio, DAP, Voice Browser, Multimodal Interaction
>>
>> ****
>>
>> ** **
>>
>> ** **
>>
>
>

Received on Thursday, 5 January 2012 07:16:10 UTC