RE: Speech Recognition and Text-to-Speech Javascript API - seeking feedback for eventual standardization from Young, Milan on 2012-01-06 (public-webapps@w3.org from January to March 2012)

From: Young, Milan <Milan.Young@nuance.com>
Date: Fri, 6 Jan 2012 10:47:26 -0800
To: Glen Shires <gshires@google.com>, <public-webapps@w3.org>
CC: <public-xg-htmlspeech@w3.org>, Arthur Barstow <art.barstow@nokia.com>, Dan Burnett <dburnett@voxeo.com>
Message-ID: <1AA381D92997964F898DF2A3AA4FF9AD0DFB22FC@SUN-EXCH01.nuance.com>
The HTML Speech XG worked for over a year prioritizing use cases against
timelines and packaged all of that into a recommendation complete with
IDLs and examples.  So while I understand that WebApps may not have the
time to review the entirety of this work, it's hard to see how
dissecting it would speed the process of understanding.

 

Perhaps a better approach would be to find half an hour to present to
select members of WebApps the content of the recommendation and the
possible relevance to their group.  Does that sound reasonable?

 

Thanks

 

 

 

From: Glen Shires [mailto:gshires@google.com] 
Sent: Wednesday, January 04, 2012 11:15 PM
To: public-webapps@w3.org
Cc: public-xg-htmlspeech@w3.org; Arthur Barstow; Dan Burnett
Subject: Speech Recognition and Text-to-Speech Javascript API - seeking
feedback for eventual standardization

 

As Dan Burnett wrote below: The HTML Speech Incubator Group [1] has
recently wrapped up its work on use cases, requirements, and proposals
for adding automatic speech recognition (ASR) and text-to-speech (TTS)
capabilities to HTML.  The work of the group is documented in the
group's Final Report. [2]  The members of the group intend this work to
be input to one or more working groups, in W3C and/or other standards
development organizations such as the IETF, as an aid to developing full
standards in this space.

 

Because that work was so broad, Art Barstow asked (below) for a
relatively specific proposal.  We at Google are proposing that a subset
of it be accepted as a work item by the Web Applications WG.
Specifically, we are proposing this Javascript API [3], which enables
web developers to incorporate speech recognition and synthesis into
their web pages. This simplified subset enables developers to use
scripting to generate text-to-speech output and to use speech
recognition as an input for forms, continuous dictation and control, and
it supports the majority of use-cases in the Incubator Group's Final
Report.

 

We welcome your feedback and ask that the Web Applications WG consider
accepting this Javascript API [3] as a work item.

 

[1] charter:  http://www.w3.org/2005/Incubator/htmlspeech/charter

[2] report: http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/

[3] API:
http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/s
peechapi.html

 

Bjorn Bringert

Satish Sampath

Glen Shires

 

On Thu, Dec 22, 2011 at 11:38 AM, Glen Shires <gshires@google.com>
wrote:

Milan,

The IDLs contained in both documents are in the same format and order,
so it's relatively easy to compare the two side
<http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech-20111206/#sp
eechreco-section> -by-side
<http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/
speechapi.html#api_description> . The semantics of the attributes,
methods and events have not changed, and both IDLs link directly to the
definitions contained in the Speech XG Final Report. 

 

As you mention, we agree that the protocol portions of the Speech XG
Final Report are most appropriate for consideration by a group such as
IETF, and believe such work can proceed independently, particularly
because the Speech XG Final Report has provided a roadmap for these to
remain compatible.  Also, as shown in the Speech XG Final Report -
Overview
<http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech-20111206/#in
troductory> , the "Speech Web API" is not dependent on the "Speech
Protocol" and a "Default Speech" service can be used for local or remote
speech recognition and synthesis.

 

Glen Shires

 

On Thu, Dec 22, 2011 at 10:32 AM, Young, Milan <Milan.Young@nuance.com>
wrote:

Hello Glen,

 

The proposal says that it contains a "simplified subset of the
JavaScript API".  Could you please clarify which elements of the
HTMLSpeech recommendation's JavaScript API were omitted?   I think this
would be the most efficient way for those of us familiar with the XG
recommendation to evaluate the new proposal.

 

I'd also appreciate clarification on how you see the protocol being
handled.  In the HTMLSpeech group we were thinking about this as a
hand-in-hand relationship between W3C and IETF like WebSockets.  Is this
still your (and Google's) vision?

 

Thanks

 

 

From: Glen Shires [mailto:gshires@google.com] 
Sent: Thursday, December 22, 2011 11:14 AM
To: public-webapps@w3.org; Arthur Barstow
Cc: public-xg-htmlspeech@w3.org; Dan Burnett


Subject: Re: HTML Speech XG Completes, seeks feedback for eventual
standardization

 

We at Google believe that a scripting-only (Javascript) subset of the
API defined in the Speech XG Incubator Group Final Report is of
appropriate scope for consideration by the WebApps WG.

 

The enclosed scripting-only subset supports the majority of the
use-cases and samples in the XG proposal. Specifically, it enables
web-pages to generate speech output and to use speech recognition as an
input for forms, continuous dictation and control. The Javascript API
will allow web pages to control activation and timing and to handle
results and alternatives.

 

We welcome your feedback and ask that the Web Applications WG consider
accepting this as a work item.

 

Bjorn Bringert

Satish Sampath

Glen Shires

 

On Tue, Dec 13, 2011 at 11:39 AM, Glen Shires <gshires@google.com>
wrote:

We at Google believe that a scripting-only (Javascript) subset of the
API defined in the Speech XG Incubator Group Final Report [1] is of
appropriate scope for consideration by the WebApps WG.

 

A scripting-only subset supports the majority of the use-cases and
samples in the XG proposal. Specifically, it enables web-pages to
generate speech output and to use speech recognition as an input for
forms, continuous dictation and control. The Javascript API will allow
web pages to control activation and timing and to handle results and
alternatives

 

As Dan points out above, we envision that different portions of the
Incubator Group Final Report are applicable to different working groups
"in W3C and/or other standards development organizations such as the
IETF".  This scripting API subset does not preclude other groups from
pursuing standardization of relevant HTML markup or underlying transport
protocols, and indeed the Incubator Group Final Report defines a
potential roadmap such that such additions can be compatible.

 

To make this more concrete, Google will provide to this mailing list a
specific proposal extracted from the Incubator Group Final Report, that
includes only those portions we believe are relevant to WebApps, with
links back to the Incubator Report as appropriate.

 

Bjorn Bringert

Satish Sampath

Glen Shires

 

[1] http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/

 

On Tue, Dec 13, 2011 at 5:32 AM, Dan Burnett <dburnett@voxeo.com> wrote:

Thanks for the info, Art.  To be clear, I personally am *NOT* proposing
adding any specs to WebApps, although others might.  My email below as a
Chair of the group is merely to inform people of this work and ask for
feedback.
I expect that your information will be useful for others who might wish
for some of this work to continue in WebApps.

-- dan



On Dec 13, 2011, at 7:06 AM, Arthur Barstow wrote:

> Hi Dan,
>
> WebApps already has a relatively large number of specs in progress
(see [PubStatus]) and the group has agreed to add some additional specs
(see [CharterChanges]). As such, please provide a relatively specific
proposal about the features/specs you and other proponents would like to
add to WebApps.
>
> Regarding the level of detail for your proposal, I think a reasonable
precedence is something like the Gamepad and Pointer/MouseLock proposals
(see [CharterChanges]). (Perhaps this could be achieved by identifying
specific sections in the XG's Final Report?)
>
> -Art Barstow
>
> [PubStatus]
http://www.w3.org/2008/webapps/wiki/PubStatus#API_Specifications
> [CharterChanges]
http://www.w3.org/2008/webapps/wiki/CharterChanges#Additions_Agreed
>
> On 12/12/11 5:25 PM, ext Dan Burnett wrote:
>> Dear WebApps people,
>>
>> The HTML Speech Incubator Group [1] has recently wrapped up its work
on use cases, requirements, and proposals for adding automatic speech
recognition (ASR) and text-to-speech (TTS) capabilities to HTML.  The
work of the group is documented in the group's Final Report. [2]
>>
>> The members of the group intend this work to be input to one or more
working groups, in W3C and/or other standards development organizations
such as the IETF, as an aid to developing full standards in this space.
>> Whether the W3C work happens in a new Working Group or an existing
one, we are interested in collecting feedback on the Incubator Group's
work.  We are specifically interested in input from the members of the
WebApps Working Group.
>>
>> If you have any feedback to share, please send it to, or cc, the
group's mailing list (public-xg-htmlspeech@w3.org).  This will allow
comments to be archived in one consistent location for use by whatever
group takes up this work.
>>
>>
>> Dan Burnett, Co-Chair
>> HTML Speech Incubator Group
>>
>>
>> [1] charter:  http://www.w3.org/2005/Incubator/htmlspeech/charter
>> [2] http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/
>>
>> p.s.  This feedback request is being sent to the following groups:
WebApps, HTML, Audio, DAP, Voice Browser, Multimodal Interaction
Received on Friday, 6 January 2012 18:48:06 UTC