W3C home > Mailing lists > Public > public-webapps@w3.org > January to March 2012

Re: Speech Recognition and Text-to-Speech Javascript API - seeking feedback for eventual standardization

From: Glen Shires <gshires@google.com>
Date: Tue, 10 Jan 2012 08:25:01 -0800
Message-ID: <CAEE5bchsOW62pL59qfYXa=rs9XfGNvg7GaV-+-UtjDC7xs-+CA@mail.gmail.com>
To: Arthur Barstow <art.barstow@nokia.com>, olli@pettay.fi
Cc: public-webapps@w3.org, public-xg-htmlspeech@w3.org, Dan Burnett <dburnett@voxeo.com>, ext Satish S <satish@google.com>, Peter Beverloo <peter@chromium.org>
Art,
Per #2 Editor commitment(s): we confirm that Bjorn Bringert, Satish Sampath
and Glen Shires volunteer as editors. If others would like to help, we
welcome them.

Per #4 Testing commitment(s): can you elaborate on what you would like to
see at this point?

Also, what is the next step?

On Mon, Jan 9, 2012 at 8:12 AM, Olli Pettay <Olli.Pettay@helsinki.fi> wrote:

> On 01/09/2012 04:59 PM, Arthur Barstow wrote:
>
>> Hi All,
>>
>> As I indicated in [1], WebApps already has a relatively large number of
>> specs in progress and the group has agreed to add some new specs. As
>> such, to review any new charter addition proposals, I think we need at
>> least the following:
>>
>> 1. Relatively clear scope of the feature(s). (This information should be
>> detailed enough for WG members with relevant IP to be able to make an IP
>> assessment.)
>>
>> 2. Editor commitment(s)
>>
>> 3. Implementation commitments from at least two WG members
>>
> Is this really requirement nowadays?
> Is there for example commitment to implement
> File System API?
> http://dev.w3.org/2009/dap/**file-system/file-dir-sys.html<http://dev.w3.org/2009/dap/file-system/file-dir-sys.html>
>
> But anyway, I'm interested to implement the speech API,
> and as far as I know, also other people involved with Mozilla
> have shown interest.
>
>
>
>
>> 4. Testing commitment(s)
>>
>> Re the APIs in this thread -> I think Glen's API proposal [2] adequately
>> addresses #1 above and his previous responses imply support for #2 but
>> it would be good for Glen, et al. to confirm. Re #3, other than Google,
>> I don't believe any other implementor has voiced their support for
>> WebApps adding these APIs. As such, I think we we need additional input
>> on implementation support (e.g. Apple, Microsoft, Mozilla, Opera, etc.).
>>
>
> It doesn't matter too much to me in which group the API will be developed
> (except that I'm against doing it in HTML WG).
> WebApps is reasonably good place (if there won't be any IP issues.)
>
>
>
>
> -Olli
>
>
>
>
>> Re the markup question -> WebAppsdoes have some precedence for defining
>> markup (e.g. XBL2, Widget XML config). I don't have a strong opinion on
>> whether or not WebApps should include the type of markup in the XG
>> Report. I think the next step here is for WG members to submit comments
>> on this question. In particular, proponents of including markup in
>> WebApps' charter should respond to #1-4 above.
>>
>> -AB
>>
>> [1] http://lists.w3.org/Archives/**Public/public-webapps/**
>> 2011OctDec/1474.html<http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/1474.html>
>> [2]
>> http://lists.w3.org/Archives/**Public/public-webapps/**
>> 2011OctDec/att-1696/speechapi.**html<http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/speechapi.html>
>>
>>
>>
>> On 1/5/12 6:49 AM, ext Satish S wrote:
>>
>>>
>>> 2) How does the draft incorporate with the existing <input speech>
>>> API[1]? It seems to me as if it'd be best to define both the attribute
>>> as the DOM APIs in a single specification, also because they share
>>> several events (yet don't seem to be interchangeable) and the
>>> attribute already has an implementation.
>>>
>>>
>>> The <input speech> API proposal was implemented as <input
>>> x-webkit-speech> in Chromium a while ago. A lot of the developer
>>> feedback we received was about finer grained control including a
>>> javascript API and letting the web application decide how to present
>>> the user interface rather than tying it to the <input> element.
>>>
>>> The HTML Speech Incubator Group's final report [1] includes a <reco>
>>> element which addresses both these concerns and provides automatic
>>> binding of speech recognition results to existing HTML elements. We
>>> are not sure if the WebApps WG is a good place to work on
>>> standardising such markup elements, hence did not include in the
>>> simplified Javascript API [2]. If there is sufficient interest and
>>> scope in the WebApps WG charter for the Javascript API and markup, we
>>> are happy to combine them both in the proposal.
>>>
>>> [1] http://www.w3.org/2005/**Incubator/htmlspeech/XGR-**htmlspeech/<http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/>
>>> [2]
>>> http://lists.w3.org/Archives/**Public/public-webapps/**
>>> 2011OctDec/att-1696/speechapi.**html<http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/speechapi.html>
>>>
>>>
>>>
>>> Thanks,
>>> Peter
>>>
>>> [1]
>>> http://lists.w3.org/Archives/**Public/public-xg-htmlspeech/**
>>> 2011Feb/att-0020/api-draft.**html<http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Feb/att-0020/api-draft.html>
>>>
>>>
>>> On Thu, Jan 5, 2012 at 07:15, Glen Shires <gshires@google.com
>>> <mailto:gshires@google.com>> wrote:
>>> > As Dan Burnett wrote below: The HTML Speech Incubator Group [1]
>>> has recently
>>> > wrapped up its work on use cases, requirements, and proposals
>>> for adding
>>> > automatic speech recognition (ASR) and text-to-speech (TTS)
>>> capabilities to
>>> > HTML. The work of the group is documented in the group's Final
>>> Report. [2]
>>> > The members of the group intend this work to be input to one or more
>>> > working groups, in W3C and/or other standards development
>>> organizations such
>>> > as the IETF, as an aid to developing full standards in this space.
>>> >
>>> > Because that work was so broad, Art Barstow asked (below) for a
>>> relatively
>>> > specific proposal. We at Google are proposing that a subset of it be
>>> > accepted as a work item by the Web Applications WG.
>>> Specifically, we are
>>> > proposing this Javascript API [3], which enables web developers to
>>> > incorporate speech recognition and synthesis into their web pages.
>>> > This simplified subset enables developers to use scripting to
>>> generate
>>> > text-to-speech output and to use speech recognition as an input
>>> for forms,
>>> > continuous dictation and control, and it supports the majority
>>> of use-cases
>>> > in the Incubator Group's Final Report.
>>> >
>>> > We welcome your feedback and ask that the Web Applications WG
>>> > consider accepting this Javascript API [3] as a work item.
>>> >
>>> > [1] charter: http://www.w3.org/2005/**Incubator/htmlspeech/charter<http://www.w3.org/2005/Incubator/htmlspeech/charter>
>>> > [2] report:
>>> http://www.w3.org/2005/**Incubator/htmlspeech/XGR-**htmlspeech/<http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/>
>>> > [3]
>>> > API:
>>> http://lists.w3.org/Archives/**Public/public-webapps/**
>>> 2011OctDec/att-1696/speechapi.**html<http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/speechapi.html>
>>>
>>> >
>>> > Bjorn Bringert
>>> > Satish Sampath
>>> > Glen Shires
>>> >
>>> > On Thu, Dec 22, 2011 at 11:38 AM, Glen Shires
>>> <gshires@google.com <mailto:gshires@google.com>> wrote:
>>> >>
>>> >> Milan,
>>> >> The IDLs contained in both documents are in the same format and
>>> order, so
>>> >> it's relatively easy to compare the two side-by-side. The
>>> semantics of the
>>> >> attributes, methods and events have not changed, and both IDLs
>>> link directly
>>> >> to the definitions contained in the Speech XG Final Report.
>>> >>
>>> >> As you mention, we agree that the protocol portions of the
>>> Speech XG Final
>>> >> Report are most appropriate for consideration by a group such
>>> as IETF, and
>>> >> believe such work can proceed independently, particularly
>>> because the Speech
>>> >> XG Final Report has provided a roadmap for these to remain
>>> compatible.
>>> >> Also, as shown in the Speech XG Final Report - Overview, the
>>> "Speech Web
>>> >> API" is not dependent on the "Speech Protocol" and a "Default
>>> Speech"
>>> >> service can be used for local or remote speech recognition and
>>> synthesis.
>>> >>
>>> >> Glen Shires
>>> >>
>>> >>
>>> >> On Thu, Dec 22, 2011 at 10:32 AM, Young, Milan
>>> <Milan.Young@nuance.com <mailto:Milan.Young@nuance.com**>>
>>> >> wrote:
>>> >>>
>>> >>> Hello Glen,
>>> >>>
>>> >>>
>>> >>>
>>> >>> The proposal says that it contains a “simplified subset of the
>>> JavaScript
>>> >>> API”. Could you please clarify which elements of the HTMLSpeech
>>> >>> recommendation’s JavaScript API were omitted? I think this
>>> would be the
>>> >>> most efficient way for those of us familiar with the XG
>>> recommendation to
>>> >>> evaluate the new proposal.
>>> >>>
>>> >>>
>>> >>>
>>> >>> I’d also appreciate clarification on how you see the protocol
>>> being
>>> >>> handled. In the HTMLSpeech group we were thinking about this as a
>>> >>> hand-in-hand relationship between W3C and IETF like
>>> WebSockets. Is this
>>> >>> still your (and Google’s) vision?
>>> >>>
>>> >>>
>>> >>>
>>> >>> Thanks
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> From: Glen Shires [mailto:gshires@google.com
>>> <mailto:gshires@google.com>]
>>> >>> Sent: Thursday, December 22, 2011 11:14 AM
>>> >>> To: public-webapps@w3.org <mailto:public-webapps@w3.org>**;
>>> Arthur Barstow
>>> >>> Cc: public-xg-htmlspeech@w3.org
>>> <mailto:public-xg-htmlspeech@**w3.org <public-xg-htmlspeech@w3.org>>;
>>> Dan Burnett
>>> >>>
>>> >>>
>>> >>> Subject: Re: HTML Speech XG Completes, seeks feedback for eventual
>>> >>> standardization
>>> >>>
>>> >>>
>>> >>>
>>> >>> We at Google believe that a scripting-only (Javascript) subset
>>> of the API
>>> >>> defined in the Speech XG Incubator Group Final Report is of
>>> appropriate
>>> >>> scope for consideration by the WebApps WG.
>>> >>>
>>> >>>
>>> >>>
>>> >>> The enclosed scripting-only subset supports the majority of
>>> the use-cases
>>> >>> and samples in the XG proposal. Specifically, it enables
>>> web-pages to
>>> >>> generate speech output and to use speech recognition as an
>>> input for forms,
>>> >>> continuous dictation and control. The Javascript API will
>>> allow web pages to
>>> >>> control activation and timing and to handle results and
>>> alternatives.
>>> >>>
>>> >>>
>>> >>>
>>> >>> We welcome your feedback and ask that the Web Applications WG
>>> consider
>>> >>> accepting this as a work item.
>>> >>>
>>> >>>
>>> >>>
>>> >>> Bjorn Bringert
>>> >>>
>>> >>> Satish Sampath
>>> >>>
>>> >>> Glen Shires
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Tue, Dec 13, 2011 at 11:39 AM, Glen Shires
>>> <gshires@google.com <mailto:gshires@google.com>> wrote:
>>> >>>
>>> >>> We at Google believe that a scripting-only (Javascript) subset
>>> of the API
>>> >>> defined in the Speech XG Incubator Group Final Report [1] is
>>> of appropriate
>>> >>> scope for consideration by the WebApps WG.
>>> >>>
>>> >>>
>>> >>>
>>> >>> A scripting-only subset supports the majority of the use-cases and
>>> >>> samples in the XG proposal. Specifically, it enables web-pages
>>> to generate
>>> >>> speech output and to use speech recognition as an input for forms,
>>> >>> continuous dictation and control. The Javascript API will
>>> allow web pages to
>>> >>> control activation and timing and to handle results and
>>> alternatives
>>> >>>
>>> >>>
>>> >>>
>>> >>> As Dan points out above, we envision that different portions
>>> of the
>>> >>> Incubator Group Final Report are applicable to different
>>> working groups "in
>>> >>> W3C and/or other standards development organizations such as
>>> the IETF".
>>> >>> This scripting API subset does not preclude other groups from
>>> pursuing
>>> >>> standardization of relevant HTML markup or underlying
>>> transport protocols,
>>> >>> and indeed the Incubator Group Final Report defines a
>>> potential roadmap such
>>> >>> that such additions can be compatible.
>>> >>>
>>> >>>
>>> >>>
>>> >>> To make this more concrete, Google will provide to this
>>> mailing list a
>>> >>> specific proposal extracted from the Incubator Group Final
>>> Report, that
>>> >>> includes only those portions we believe are relevant to
>>> WebApps, with links
>>> >>> back to the Incubator Report as appropriate.
>>> >>>
>>> >>>
>>> >>>
>>> >>> Bjorn Bringert
>>> >>>
>>> >>> Satish Sampath
>>> >>>
>>> >>> Glen Shires
>>> >>>
>>> >>>
>>> >>>
>>> >>> [1] http://www.w3.org/2005/**Incubator/htmlspeech/XGR-**htmlspeech/<http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Tue, Dec 13, 2011 at 5:32 AM, Dan Burnett
>>> <dburnett@voxeo.com <mailto:dburnett@voxeo.com>> wrote:
>>> >>>
>>> >>> Thanks for the info, Art. To be clear, I personally am *NOT*
>>> proposing
>>> >>> adding any specs to WebApps, although others might. My email
>>> below as a
>>> >>> Chair of the group is merely to inform people of this work and
>>> ask for
>>> >>> feedback.
>>> >>> I expect that your information will be useful for others who
>>> might wish
>>> >>> for some of this work to continue in WebApps.
>>> >>>
>>> >>> -- dan
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Dec 13, 2011, at 7:06 AM, Arthur Barstow wrote:
>>> >>>
>>> >>> > Hi Dan,
>>> >>> >
>>> >>> > WebApps already has a relatively large number of specs in
>>> progress (see
>>> >>> > [PubStatus]) and the group has agreed to add some additional
>>> specs (see
>>> >>> > [CharterChanges]). As such, please provide a relatively
>>> specific proposal
>>> >>> > about the features/specs you and other proponents would like
>>> to add to
>>> >>> > WebApps.
>>> >>> >
>>> >>> > Regarding the level of detail for your proposal, I think a
>>> reasonable
>>> >>> > precedence is something like the Gamepad and
>>> Pointer/MouseLock proposals
>>> >>> > (see [CharterChanges]). (Perhaps this could be achieved by
>>> identifying
>>> >>> > specific sections in the XG's Final Report?)
>>> >>> >
>>> >>> > -Art Barstow
>>> >>> >
>>> >>> > [PubStatus]
>>> >>> > http://www.w3.org/2008/**webapps/wiki/PubStatus#API_**
>>> Specifications<http://www.w3.org/2008/webapps/wiki/PubStatus#API_Specifications>
>>> >>> > [CharterChanges]
>>> >>> >
>>> http://www.w3.org/2008/**webapps/wiki/CharterChanges#**Additions_Agreed<http://www.w3.org/2008/webapps/wiki/CharterChanges#Additions_Agreed>
>>> >>> >
>>> >>> > On 12/12/11 5:25 PM, ext Dan Burnett wrote:
>>> >>> >> Dear WebApps people,
>>> >>> >>
>>> >>> >> The HTML Speech Incubator Group [1] has recently wrapped up
>>> its work
>>> >>> >> on use cases, requirements, and proposals for adding
>>> automatic speech
>>> >>> >> recognition (ASR) and text-to-speech (TTS) capabilities to
>>> HTML. The work
>>> >>> >> of the group is documented in the group's Final Report. [2]
>>> >>> >>
>>> >>> >> The members of the group intend this work to be input to
>>> one or more
>>> >>> >> working groups, in W3C and/or other standards development
>>> organizations such
>>> >>> >> as the IETF, as an aid to developing full standards in this
>>> space.
>>> >>> >> Whether the W3C work happens in a new Working Group or an
>>> existing
>>> >>> >> one, we are interested in collecting feedback on the
>>> Incubator Group's work.
>>> >>> >> We are specifically interested in input from the members of
>>> the WebApps
>>> >>> >> Working Group.
>>> >>> >>
>>> >>> >> If you have any feedback to share, please send it to, or
>>> cc, the
>>> >>> >> group's mailing list (public-xg-htmlspeech@w3.org
>>> <mailto:public-xg-htmlspeech@**w3.org <public-xg-htmlspeech@w3.org>>).
>>> This will allow
>>> >>> >> comments to be archived in one consistent location for use
>>> by whatever group
>>> >>> >> takes up this work.
>>> >>> >>
>>> >>> >>
>>> >>> >> Dan Burnett, Co-Chair
>>> >>> >> HTML Speech Incubator Group
>>> >>> >>
>>> >>> >>
>>> >>> >> [1] charter:
>>> http://www.w3.org/2005/**Incubator/htmlspeech/charter<http://www.w3.org/2005/Incubator/htmlspeech/charter>
>>> >>> >> [2] http://www.w3.org/2005/**Incubator/htmlspeech/XGR-**
>>> htmlspeech/<http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/>
>>> >>> >>
>>> >>> >> p.s. This feedback request is being sent to the following
>>> groups:
>>> >>> >> WebApps, HTML, Audio, DAP, Voice Browser, Multimodal
>>> Interaction
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>
>>> >>
>>> >
>>>
>>>
>>>
>>
>
Received on Tuesday, 10 January 2012 16:26:25 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:49:49 GMT