W3C home > Mailing lists > Public > public-xg-htmlspeech@w3.org > January 2012

Re: Speech Recognition and Text-to-Speech Javascript API - seeking feedback for eventual standardization

From: Olli Pettay <Olli.Pettay@helsinki.fi>
Date: Mon, 09 Jan 2012 18:35:43 +0200
Message-ID: <4F0B175F.7070605@helsinki.fi>
To: "Young, Milan" <Milan.Young@nuance.com>
CC: Webapps WG <public-webapps@w3.org>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
On 01/09/2012 06:17 PM, Young, Milan wrote:
> To clarify, are you interested in developing the entirety of the JS API
> we developed in the HTML Speech XG, or just the subset proposed by
> Google?

Not sure if you sent the reply to me only on purpose.
CCing the WG and XG lists.

Since from practical point of view
the API+protocol XG defined is a huge thing to implement at once, it
makes sense to implement it in pieces. Something like
(1) Initial API implementation. Some subset of what XG defined
     Not necessarily exactly what Google proposed but something close to
     it. Support for remote speech services could be in the initial API,
     but if UA doesn't implement the protocol, it would just fail when
     trying to connect to remove services.
(2) Simultaneously or later - depending on the protocol standardization
     in IETF or elsewhere - support remote speech services
(3) implement some more of the API XG defined (if needed by web
     developers or web services)
(4) Implement <reco>? I'm not at all convinced we need reco element
     since automatic value binding makes it just a bit strange and
     inconsistent.


This is the way web APIs tend to evolve. Implement first something quite 
small, and then add new features if/when needed.



-Olli



>
> Thanks
>
>
> -----Original Message-----
> From: Olli Pettay [mailto:Olli.Pettay@helsinki.fi]
> Sent: Monday, January 09, 2012 8:13 AM
> To: Arthur Barstow
> Cc: ext Satish S; Peter Beverloo; Glen Shires; public-webapps@w3.org;
> public-xg-htmlspeech@w3.org; Dan Burnett
> Subject: Re: Speech Recognition and Text-to-Speech Javascript API -
> seeking feedback for eventual standardization
>
> On 01/09/2012 04:59 PM, Arthur Barstow wrote:
>> Hi All,
>>
>> As I indicated in [1], WebApps already has a relatively large number
>> of specs in progress and the group has agreed to add some new specs.
>> As such, to review any new charter addition proposals, I think we need
>
>> at least the following:
>>
>> 1. Relatively clear scope of the feature(s). (This information should
>> be detailed enough for WG members with relevant IP to be able to make
>> an IP
>> assessment.)
>>
>> 2. Editor commitment(s)
>>
>> 3. Implementation commitments from at least two WG members
> Is this really requirement nowadays?
> Is there for example commitment to implement File System API?
> http://dev.w3.org/2009/dap/file-system/file-dir-sys.html
>
> But anyway, I'm interested to implement the speech API, and as far as I
> know, also other people involved with Mozilla have shown interest.
>
>
>>
>> 4. Testing commitment(s)
>>
>> Re the APIs in this thread ->  I think Glen's API proposal [2]
>> adequately addresses #1 above and his previous responses imply support
>
>> for #2 but it would be good for Glen, et al. to confirm. Re #3, other
>> than Google, I don't believe any other implementor has voiced their
>> support for WebApps adding these APIs. As such, I think we we need
>> additional input on implementation support (e.g. Apple, Microsoft,
> Mozilla, Opera, etc.).
>
> It doesn't matter too much to me in which group the API will be
> developed (except that I'm against doing it in HTML WG).
> WebApps is reasonably good place (if there won't be any IP issues.)
>
>
>
>
> -Olli
>
>
>>
>> Re the markup question ->  WebAppsdoes have some precedence for
> defining
>> markup (e.g. XBL2, Widget XML config). I don't have a strong opinion
> on
>> whether or not WebApps should include the type of markup in the XG
>> Report. I think the next step here is for WG members to submit
> comments
>> on this question. In particular, proponents of including markup in
>> WebApps' charter should respond to #1-4 above.
>>
>> -AB
>>
>> [1]
> http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/1474.html
>> [2]
>>
> http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/s
> peechapi.html
>>
>>
>>
>> On 1/5/12 6:49 AM, ext Satish S wrote:
>>>
>>> 2) How does the draft incorporate with the existing<input speech>
>>> API[1]? It seems to me as if it'd be best to define both the
> attribute
>>> as the DOM APIs in a single specification, also because they share
>>> several events (yet don't seem to be interchangeable) and the
>>> attribute already has an implementation.
>>>
>>>
>>> The<input speech>  API proposal was implemented as<input
>>> x-webkit-speech>  in Chromium a while ago. A lot of the developer
>>> feedback we received was about finer grained control including a
>>> javascript API and letting the web application decide how to present
>>> the user interface rather than tying it to the<input>  element.
>>>
>>> The HTML Speech Incubator Group's final report [1] includes a<reco>
>>> element which addresses both these concerns and provides automatic
>>> binding of speech recognition results to existing HTML elements. We
>>> are not sure if the WebApps WG is a good place to work on
>>> standardising such markup elements, hence did not include in the
>>> simplified Javascript API [2]. If there is sufficient interest and
>>> scope in the WebApps WG charter for the Javascript API and markup, we
>>> are happy to combine them both in the proposal.
>>>
>>> [1] http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/
>>> [2]
>>>
> http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/s
> peechapi.html
>>>
>>>
>>>
>>> Thanks,
>>> Peter
>>>
>>> [1]
>>>
> http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Feb/att-002
> 0/api-draft.html
>>>
>>>
>>> On Thu, Jan 5, 2012 at 07:15, Glen Shires<gshires@google.com
>>> <mailto:gshires@google.com>>  wrote:
>>>> As Dan Burnett wrote below: The HTML Speech Incubator Group [1]
>>> has recently
>>>> wrapped up its work on use cases, requirements, and proposals
>>> for adding
>>>> automatic speech recognition (ASR) and text-to-speech (TTS)
>>> capabilities to
>>>> HTML. The work of the group is documented in the group's Final
>>> Report. [2]
>>>> The members of the group intend this work to be input to one or
> more
>>>> working groups, in W3C and/or other standards development
>>> organizations such
>>>> as the IETF, as an aid to developing full standards in this space.
>>>>
>>>> Because that work was so broad, Art Barstow asked (below) for a
>>> relatively
>>>> specific proposal. We at Google are proposing that a subset of it
> be
>>>> accepted as a work item by the Web Applications WG.
>>> Specifically, we are
>>>> proposing this Javascript API [3], which enables web developers to
>>>> incorporate speech recognition and synthesis into their web pages.
>>>> This simplified subset enables developers to use scripting to
>>> generate
>>>> text-to-speech output and to use speech recognition as an input
>>> for forms,
>>>> continuous dictation and control, and it supports the majority
>>> of use-cases
>>>> in the Incubator Group's Final Report.
>>>>
>>>> We welcome your feedback and ask that the Web Applications WG
>>>> consider accepting this Javascript API [3] as a work item.
>>>>
>>>> [1] charter: http://www.w3.org/2005/Incubator/htmlspeech/charter
>>>> [2] report:
>>> http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/
>>>> [3]
>>>> API:
>>>
> http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/s
> peechapi.html
>>>
>>>>
>>>> Bjorn Bringert
>>>> Satish Sampath
>>>> Glen Shires
>>>>
>>>> On Thu, Dec 22, 2011 at 11:38 AM, Glen Shires
>>> <gshires@google.com<mailto:gshires@google.com>>  wrote:
>>>>>
>>>>> Milan,
>>>>> The IDLs contained in both documents are in the same format and
>>> order, so
>>>>> it's relatively easy to compare the two side-by-side. The
>>> semantics of the
>>>>> attributes, methods and events have not changed, and both IDLs
>>> link directly
>>>>> to the definitions contained in the Speech XG Final Report.
>>>>>
>>>>> As you mention, we agree that the protocol portions of the
>>> Speech XG Final
>>>>> Report are most appropriate for consideration by a group such
>>> as IETF, and
>>>>> believe such work can proceed independently, particularly
>>> because the Speech
>>>>> XG Final Report has provided a roadmap for these to remain
>>> compatible.
>>>>> Also, as shown in the Speech XG Final Report - Overview, the
>>> "Speech Web
>>>>> API" is not dependent on the "Speech Protocol" and a "Default
>>> Speech"
>>>>> service can be used for local or remote speech recognition and
>>> synthesis.
>>>>>
>>>>> Glen Shires
>>>>>
>>>>>
>>>>> On Thu, Dec 22, 2011 at 10:32 AM, Young, Milan
>>> <Milan.Young@nuance.com<mailto:Milan.Young@nuance.com>>
>>>>> wrote:
>>>>>>
>>>>>> Hello Glen,
>>>>>>
>>>>>>
>>>>>>
>>>>>> The proposal says that it contains a "simplified subset of the
>>> JavaScript
>>>>>> API". Could you please clarify which elements of the HTMLSpeech
>>>>>> recommendation's JavaScript API were omitted? I think this
>>> would be the
>>>>>> most efficient way for those of us familiar with the XG
>>> recommendation to
>>>>>> evaluate the new proposal.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'd also appreciate clarification on how you see the protocol
>>> being
>>>>>> handled. In the HTMLSpeech group we were thinking about this as a
>>>>>> hand-in-hand relationship between W3C and IETF like
>>> WebSockets. Is this
>>>>>> still your (and Google's) vision?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> From: Glen Shires [mailto:gshires@google.com
>>> <mailto:gshires@google.com>]
>>>>>> Sent: Thursday, December 22, 2011 11:14 AM
>>>>>> To: public-webapps@w3.org<mailto:public-webapps@w3.org>;
>>> Arthur Barstow
>>>>>> Cc: public-xg-htmlspeech@w3.org
>>> <mailto:public-xg-htmlspeech@w3.org>; Dan Burnett
>>>>>>
>>>>>>
>>>>>> Subject: Re: HTML Speech XG Completes, seeks feedback for
> eventual
>>>>>> standardization
>>>>>>
>>>>>>
>>>>>>
>>>>>> We at Google believe that a scripting-only (Javascript) subset
>>> of the API
>>>>>> defined in the Speech XG Incubator Group Final Report is of
>>> appropriate
>>>>>> scope for consideration by the WebApps WG.
>>>>>>
>>>>>>
>>>>>>
>>>>>> The enclosed scripting-only subset supports the majority of
>>> the use-cases
>>>>>> and samples in the XG proposal. Specifically, it enables
>>> web-pages to
>>>>>> generate speech output and to use speech recognition as an
>>> input for forms,
>>>>>> continuous dictation and control. The Javascript API will
>>> allow web pages to
>>>>>> control activation and timing and to handle results and
>>> alternatives.
>>>>>>
>>>>>>
>>>>>>
>>>>>> We welcome your feedback and ask that the Web Applications WG
>>> consider
>>>>>> accepting this as a work item.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Bjorn Bringert
>>>>>>
>>>>>> Satish Sampath
>>>>>>
>>>>>> Glen Shires
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Dec 13, 2011 at 11:39 AM, Glen Shires
>>> <gshires@google.com<mailto:gshires@google.com>>  wrote:
>>>>>>
>>>>>> We at Google believe that a scripting-only (Javascript) subset
>>> of the API
>>>>>> defined in the Speech XG Incubator Group Final Report [1] is
>>> of appropriate
>>>>>> scope for consideration by the WebApps WG.
>>>>>>
>>>>>>
>>>>>>
>>>>>> A scripting-only subset supports the majority of the use-cases
> and
>>>>>> samples in the XG proposal. Specifically, it enables web-pages
>>> to generate
>>>>>> speech output and to use speech recognition as an input for
> forms,
>>>>>> continuous dictation and control. The Javascript API will
>>> allow web pages to
>>>>>> control activation and timing and to handle results and
>>> alternatives
>>>>>>
>>>>>>
>>>>>>
>>>>>> As Dan points out above, we envision that different portions
>>> of the
>>>>>> Incubator Group Final Report are applicable to different
>>> working groups "in
>>>>>> W3C and/or other standards development organizations such as
>>> the IETF".
>>>>>> This scripting API subset does not preclude other groups from
>>> pursuing
>>>>>> standardization of relevant HTML markup or underlying
>>> transport protocols,
>>>>>> and indeed the Incubator Group Final Report defines a
>>> potential roadmap such
>>>>>> that such additions can be compatible.
>>>>>>
>>>>>>
>>>>>>
>>>>>> To make this more concrete, Google will provide to this
>>> mailing list a
>>>>>> specific proposal extracted from the Incubator Group Final
>>> Report, that
>>>>>> includes only those portions we believe are relevant to
>>> WebApps, with links
>>>>>> back to the Incubator Report as appropriate.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Bjorn Bringert
>>>>>>
>>>>>> Satish Sampath
>>>>>>
>>>>>> Glen Shires
>>>>>>
>>>>>>
>>>>>>
>>>>>> [1] http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Dec 13, 2011 at 5:32 AM, Dan Burnett
>>> <dburnett@voxeo.com<mailto:dburnett@voxeo.com>>  wrote:
>>>>>>
>>>>>> Thanks for the info, Art. To be clear, I personally am *NOT*
>>> proposing
>>>>>> adding any specs to WebApps, although others might. My email
>>> below as a
>>>>>> Chair of the group is merely to inform people of this work and
>>> ask for
>>>>>> feedback.
>>>>>> I expect that your information will be useful for others who
>>> might wish
>>>>>> for some of this work to continue in WebApps.
>>>>>>
>>>>>> -- dan
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Dec 13, 2011, at 7:06 AM, Arthur Barstow wrote:
>>>>>>
>>>>>>> Hi Dan,
>>>>>>>
>>>>>>> WebApps already has a relatively large number of specs in
>>> progress (see
>>>>>>> [PubStatus]) and the group has agreed to add some additional
>>> specs (see
>>>>>>> [CharterChanges]). As such, please provide a relatively
>>> specific proposal
>>>>>>> about the features/specs you and other proponents would like
>>> to add to
>>>>>>> WebApps.
>>>>>>>
>>>>>>> Regarding the level of detail for your proposal, I think a
>>> reasonable
>>>>>>> precedence is something like the Gamepad and
>>> Pointer/MouseLock proposals
>>>>>>> (see [CharterChanges]). (Perhaps this could be achieved by
>>> identifying
>>>>>>> specific sections in the XG's Final Report?)
>>>>>>>
>>>>>>> -Art Barstow
>>>>>>>
>>>>>>> [PubStatus]
>>>>>>>
> http://www.w3.org/2008/webapps/wiki/PubStatus#API_Specifications
>>>>>>> [CharterChanges]
>>>>>>>
>>> http://www.w3.org/2008/webapps/wiki/CharterChanges#Additions_Agreed
>>>>>>>
>>>>>>> On 12/12/11 5:25 PM, ext Dan Burnett wrote:
>>>>>>>> Dear WebApps people,
>>>>>>>>
>>>>>>>> The HTML Speech Incubator Group [1] has recently wrapped up
>>> its work
>>>>>>>> on use cases, requirements, and proposals for adding
>>> automatic speech
>>>>>>>> recognition (ASR) and text-to-speech (TTS) capabilities to
>>> HTML. The work
>>>>>>>> of the group is documented in the group's Final Report. [2]
>>>>>>>>
>>>>>>>> The members of the group intend this work to be input to
>>> one or more
>>>>>>>> working groups, in W3C and/or other standards development
>>> organizations such
>>>>>>>> as the IETF, as an aid to developing full standards in this
>>> space.
>>>>>>>> Whether the W3C work happens in a new Working Group or an
>>> existing
>>>>>>>> one, we are interested in collecting feedback on the
>>> Incubator Group's work.
>>>>>>>> We are specifically interested in input from the members of
>>> the WebApps
>>>>>>>> Working Group.
>>>>>>>>
>>>>>>>> If you have any feedback to share, please send it to, or
>>> cc, the
>>>>>>>> group's mailing list (public-xg-htmlspeech@w3.org
>>> <mailto:public-xg-htmlspeech@w3.org>). This will allow
>>>>>>>> comments to be archived in one consistent location for use
>>> by whatever group
>>>>>>>> takes up this work.
>>>>>>>>
>>>>>>>>
>>>>>>>> Dan Burnett, Co-Chair
>>>>>>>> HTML Speech Incubator Group
>>>>>>>>
>>>>>>>>
>>>>>>>> [1] charter:
>>> http://www.w3.org/2005/Incubator/htmlspeech/charter
>>>>>>>> [2]
> http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/
>>>>>>>>
>>>>>>>> p.s. This feedback request is being sent to the following
>>> groups:
>>>>>>>> WebApps, HTML, Audio, DAP, Voice Browser, Multimodal
>>> Interaction
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>
Received on Monday, 9 January 2012 16:36:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 9 January 2012 16:36:18 GMT