Re: Speech Recognition and Text-to-Speech Javascript API - seeking feedback for eventual standardization from Satish S on 2012-01-05 (public-webapps@w3.org from January to March 2012)

From: Satish S <satish@google.com>
Date: Thu, 5 Jan 2012 11:49:10 +0000
To: Peter Beverloo <peter@chromium.org>
Cc: Glen Shires <gshires@google.com>, public-webapps@w3.org, public-xg-htmlspeech@w3.org, Arthur Barstow <art.barstow@nokia.com>, Dan Burnett <dburnett@voxeo.com>
Message-ID: <CAHZf7R=hQsqmE5QQbkc8W6qd3Wr0VRZ2J64daBTnhcWE7H=nbw@mail.gmail.com>
>
> 2) How does the draft incorporate with the existing <input speech>
> API[1]? It seems to me as if it'd be best to define both the attribute
> as the DOM APIs in a single specification, also because they share
> several events (yet don't seem to be interchangeable) and the
> attribute already has an implementation.
>

The <input speech> API proposal was implemented as <input x-webkit-speech>
in Chromium a while ago. A lot of the developer feedback we received was
about finer grained control including a javascript API and letting the web
application decide how to present the user interface rather than tying it
to the <input> element.

The HTML Speech Incubator Group's final report [1] includes a <reco>
element which addresses both these concerns and provides automatic binding
of speech recognition results to existing HTML elements. We are not sure if
the WebApps WG is a good place to work on standardising such markup
elements, hence did not include in the simplified Javascript API [2]. If
there is sufficient interest and scope in the WebApps WG charter for the
Javascript API and markup, we are happy to combine them both in the
proposal.

[1] http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/
[2]
http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/speechapi.html


>
> Thanks,
> Peter
>
> [1]
> http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Feb/att-0020/api-draft.html
>
> On Thu, Jan 5, 2012 at 07:15, Glen Shires <gshires@google.com> wrote:
> > As Dan Burnett wrote below: The HTML Speech Incubator Group [1] has
> recently
> > wrapped up its work on use cases, requirements, and proposals for adding
> > automatic speech recognition (ASR) and text-to-speech (TTS) capabilities
> to
> > HTML.  The work of the group is documented in the group's Final Report.
> [2]
> >  The members of the group intend this work to be input to one or more
> > working groups, in W3C and/or other standards development organizations
> such
> > as the IETF, as an aid to developing full standards in this space.
> >
> > Because that work was so broad, Art Barstow asked (below) for
> a relatively
> > specific proposal.  We at Google are proposing that a subset of it be
> > accepted as a work item by the Web Applications WG.  Specifically, we are
> > proposing this Javascript API [3], which enables web developers to
> > incorporate speech recognition and synthesis into their web pages.
> > This simplified subset enables developers to use scripting to generate
> > text-to-speech output and to use speech recognition as an input for
> forms,
> > continuous dictation and control, and it supports the majority of
> use-cases
> > in the Incubator Group's Final Report.
> >
> > We welcome your feedback and ask that the Web Applications WG
> > consider accepting this Javascript API [3] as a work item.
> >
> > [1] charter:  http://www.w3.org/2005/Incubator/htmlspeech/charter
> > [2] report: http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/
> > [3]
> > API:
> http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/speechapi.html
> >
> > Bjorn Bringert
> > Satish Sampath
> > Glen Shires
> >
> > On Thu, Dec 22, 2011 at 11:38 AM, Glen Shires <gshires@google.com>
> wrote:
> >>
> >> Milan,
> >> The IDLs contained in both documents are in the same format and order,
> so
> >> it's relatively easy to compare the two side-by-side. The semantics of
> the
> >> attributes, methods and events have not changed, and both IDLs link
> directly
> >> to the definitions contained in the Speech XG Final Report.
> >>
> >> As you mention, we agree that the protocol portions of the Speech XG
> Final
> >> Report are most appropriate for consideration by a group such as IETF,
> and
> >> believe such work can proceed independently, particularly because the
> Speech
> >> XG Final Report has provided a roadmap for these to remain compatible.
> >>  Also, as shown in the Speech XG Final Report - Overview, the "Speech
> Web
> >> API" is not dependent on the "Speech Protocol" and a "Default Speech"
> >> service can be used for local or remote speech recognition and
> synthesis.
> >>
> >> Glen Shires
> >>
> >>
> >> On Thu, Dec 22, 2011 at 10:32 AM, Young, Milan <Milan.Young@nuance.com>
> >> wrote:
> >>>
> >>> Hello Glen,
> >>>
> >>>
> >>>
> >>> The proposal says that it contains a “simplified subset of the
> JavaScript
> >>> API”.  Could you please clarify which elements of the HTMLSpeech
> >>> recommendation’s JavaScript API were omitted?   I think this would be
> the
> >>> most efficient way for those of us familiar with the XG recommendation
> to
> >>> evaluate the new proposal.
> >>>
> >>>
> >>>
> >>> I’d also appreciate clarification on how you see the protocol being
> >>> handled.  In the HTMLSpeech group we were thinking about this as a
> >>> hand-in-hand relationship between W3C and IETF like WebSockets.  Is
> this
> >>> still your (and Google’s) vision?
> >>>
> >>>
> >>>
> >>> Thanks
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> From: Glen Shires [mailto:gshires@google.com]
> >>> Sent: Thursday, December 22, 2011 11:14 AM
> >>> To: public-webapps@w3.org; Arthur Barstow
> >>> Cc: public-xg-htmlspeech@w3.org; Dan Burnett
> >>>
> >>>
> >>> Subject: Re: HTML Speech XG Completes, seeks feedback for eventual
> >>> standardization
> >>>
> >>>
> >>>
> >>> We at Google believe that a scripting-only (Javascript) subset of the
> API
> >>> defined in the Speech XG Incubator Group Final Report is of appropriate
> >>> scope for consideration by the WebApps WG.
> >>>
> >>>
> >>>
> >>> The enclosed scripting-only subset supports the majority of the
> use-cases
> >>> and samples in the XG proposal. Specifically, it enables web-pages to
> >>> generate speech output and to use speech recognition as an input for
> forms,
> >>> continuous dictation and control. The Javascript API will allow web
> pages to
> >>> control activation and timing and to handle results and alternatives.
> >>>
> >>>
> >>>
> >>> We welcome your feedback and ask that the Web Applications WG consider
> >>> accepting this as a work item.
> >>>
> >>>
> >>>
> >>> Bjorn Bringert
> >>>
> >>> Satish Sampath
> >>>
> >>> Glen Shires
> >>>
> >>>
> >>>
> >>> On Tue, Dec 13, 2011 at 11:39 AM, Glen Shires <gshires@google.com>
> wrote:
> >>>
> >>> We at Google believe that a scripting-only (Javascript) subset of the
> API
> >>> defined in the Speech XG Incubator Group Final Report [1] is of
> appropriate
> >>> scope for consideration by the WebApps WG.
> >>>
> >>>
> >>>
> >>> A scripting-only subset supports the majority of the use-cases and
> >>> samples in the XG proposal. Specifically, it enables web-pages to
> generate
> >>> speech output and to use speech recognition as an input for forms,
> >>> continuous dictation and control. The Javascript API will allow web
> pages to
> >>> control activation and timing and to handle results and alternatives
> >>>
> >>>
> >>>
> >>> As Dan points out above, we envision that different portions of the
> >>> Incubator Group Final Report are applicable to different working
> groups "in
> >>> W3C and/or other standards development organizations such as the IETF".
> >>>  This scripting API subset does not preclude other groups from pursuing
> >>> standardization of relevant HTML markup or underlying transport
> protocols,
> >>> and indeed the Incubator Group Final Report defines a potential
> roadmap such
> >>> that such additions can be compatible.
> >>>
> >>>
> >>>
> >>> To make this more concrete, Google will provide to this mailing list a
> >>> specific proposal extracted from the Incubator Group Final Report, that
> >>> includes only those portions we believe are relevant to WebApps, with
> links
> >>> back to the Incubator Report as appropriate.
> >>>
> >>>
> >>>
> >>> Bjorn Bringert
> >>>
> >>> Satish Sampath
> >>>
> >>> Glen Shires
> >>>
> >>>
> >>>
> >>> [1] http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/
> >>>
> >>>
> >>>
> >>> On Tue, Dec 13, 2011 at 5:32 AM, Dan Burnett <dburnett@voxeo.com>
> wrote:
> >>>
> >>> Thanks for the info, Art.  To be clear, I personally am *NOT* proposing
> >>> adding any specs to WebApps, although others might.  My email below as
> a
> >>> Chair of the group is merely to inform people of this work and ask for
> >>> feedback.
> >>> I expect that your information will be useful for others who might wish
> >>> for some of this work to continue in WebApps.
> >>>
> >>> -- dan
> >>>
> >>>
> >>>
> >>> On Dec 13, 2011, at 7:06 AM, Arthur Barstow wrote:
> >>>
> >>> > Hi Dan,
> >>> >
> >>> > WebApps already has a relatively large number of specs in progress
> (see
> >>> > [PubStatus]) and the group has agreed to add some additional specs
> (see
> >>> > [CharterChanges]). As such, please provide a relatively specific
> proposal
> >>> > about the features/specs you and other proponents would like to add
> to
> >>> > WebApps.
> >>> >
> >>> > Regarding the level of detail for your proposal, I think a reasonable
> >>> > precedence is something like the Gamepad and Pointer/MouseLock
> proposals
> >>> > (see [CharterChanges]). (Perhaps this could be achieved by
> identifying
> >>> > specific sections in the XG's Final Report?)
> >>> >
> >>> > -Art Barstow
> >>> >
> >>> > [PubStatus]
> >>> > http://www.w3.org/2008/webapps/wiki/PubStatus#API_Specifications
> >>> > [CharterChanges]
> >>> > http://www.w3.org/2008/webapps/wiki/CharterChanges#Additions_Agreed
> >>> >
> >>> > On 12/12/11 5:25 PM, ext Dan Burnett wrote:
> >>> >> Dear WebApps people,
> >>> >>
> >>> >> The HTML Speech Incubator Group [1] has recently wrapped up its work
> >>> >> on use cases, requirements, and proposals for adding automatic
> speech
> >>> >> recognition (ASR) and text-to-speech (TTS) capabilities to HTML.
>  The work
> >>> >> of the group is documented in the group's Final Report. [2]
> >>> >>
> >>> >> The members of the group intend this work to be input to one or more
> >>> >> working groups, in W3C and/or other standards development
> organizations such
> >>> >> as the IETF, as an aid to developing full standards in this space.
> >>> >> Whether the W3C work happens in a new Working Group or an existing
> >>> >> one, we are interested in collecting feedback on the Incubator
> Group's work.
> >>> >>  We are specifically interested in input from the members of the
> WebApps
> >>> >> Working Group.
> >>> >>
> >>> >> If you have any feedback to share, please send it to, or cc, the
> >>> >> group's mailing list (public-xg-htmlspeech@w3.org).  This will
> allow
> >>> >> comments to be archived in one consistent location for use by
> whatever group
> >>> >> takes up this work.
> >>> >>
> >>> >>
> >>> >> Dan Burnett, Co-Chair
> >>> >> HTML Speech Incubator Group
> >>> >>
> >>> >>
> >>> >> [1] charter:  http://www.w3.org/2005/Incubator/htmlspeech/charter
> >>> >> [2] http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/
> >>> >>
> >>> >> p.s.  This feedback request is being sent to the following groups:
> >>> >>  WebApps, HTML, Audio, DAP, Voice Browser, Multimodal Interaction
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >
>
>
Received on Thursday, 5 January 2012 11:49:43 UTC