RE: Overview paragraph from DRUTA, DAN (ATTSI) on 2011-04-20 (public-xg-htmlspeech@w3.org from April 2011)

From: DRUTA, DAN (ATTSI) <dd5826@att.com>
Date: Wed, 20 Apr 2011 14:52:19 +0000
To: Dan Burnett <dburnett@voxeo.com>
CC: "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
Message-ID: <4AA3A95D6033ED488F8AE4E45F474487058DC6@WABOTH9MSGUSR8B.ITServices.sbc.com>
Dan,

My mistake.
I had a couple of implicit references about other standards but I updated the statement (below) to make it explicitly clear in regards to where are the key similarities and differences between VoiceXML and HTMLSpeech.

Also, the updated statements provide some clarity regarding the scope based on Milan's feedback. It was not my intent to emphasize or to imply that the scope is about standardization of functionality across speech engines. As a matter of fact I wanted to stress that the goal is to encourage innovation and implicitly acknowledge that it should support extensions. The statement was all about consistent user experience and I added "irrespective of the browsers or speech engines used".  I did add a clarifying statement at the end of the paragraph to eliminate any scope confusion.
I hope this helps and thanks for the feedback.

Regards,

Dan

__________________________________________________

Context:

Speech technologies are available today from many software manufacturers on a variety of platforms with implementations allowing a wide selection of development tools and on multiple device types. These speech technologies cover aspects of the user interaction using spoken commands, dictation, text to speech and speech to text recognition. 
W3C VoiceXML and Voice browsers, for instance, allow scripting and respectively executing sophisticated interactive voice dialogues between a human and a computer. Their implementation is analogous to the way HTML works with standard browsers for visual applications. Aside from the similarities with HTML and the typical use of HTTP as a transport, VoiceXML is not a markup format for the web applications and its uses are limited to voice-only applications (phone banking, package tracking, customer relationship management, etc.) within the context of specialized Voice Browsers.
While there are endless possibilities and uses for speech technologies from accessibility to user convenience in many fields like medical, education and others, there is a great potential of adoption in the rapidly evolving realm of web applications.
 
HTML5 has brought the rich user experience to the web and developments in the field of voice recognition allow real time user to machine dialogue in the car, on the phone and at the desktop. With the web moving beyond the desktop and the browser, it is imperative necessary to find ways to streamline the process of developing web applications and create interoperable speech APIs that will work across multiple browsers and speech providers giving the developers choice and consistency in implementing rich speech enabled web applications.

Goals:

The goal for the HTML Speech incubator group is to identify and document common requirements and use cases necessary to support the standardization of  API(s) that will enable web developers to design and deploy speech enabled web applications and provide the user with a consistent experience across different platforms and devices irrespective of the browsers or speech engine used. The outcome of the group's findings should result in  design recommendation(s) that would foster innovation and promote consistency by leveraging and enhancing existing work in W3C as well as other standards bodies. It is not the group's goal to standardize the functionality across speech recognition engines, but rather to allow web application portability across browsers by standardizing the interaction between the speech enabled web application with the user and with the user agent.

The driver for the standard design will be a common and agreed upon understanding of the elements and interactions necessary to create an end to end multi modal user experience  and to avoid development fragmentation in using HTML and JavaScript when developing interoperable speech enabled web applications.


-----Original Message-----
From: Dan Burnett [mailto:dburnett@voxeo.com] 
Sent: Wednesday, April 20, 2011 3:36 AM
To: DRUTA, DAN (ATTSI)
Cc: public-xg-htmlspeech@w3.org
Subject: Re: Overview paragraph

Thanks Dan.  One of the drivers last week for creating this paragraph  
was also to make clear why we needed to do this work when VoiceXML  
already exists.  We in the group understand this, but others may not  
(immediately).  I didn't see any mention of why HTMLSpeech is *not*  
VoiceXML.  Did you plan to add something about this later on?

-- dan

On Apr 19, 2011, at 4:35 PM, DRUTA, DAN (ATTSI) wrote:

> Group,
> At the last meeting I volunteered to put together a few paragraphs  
> that would set the context, the rationale and the goals for the HTML  
> Speech incubation group.
> Below is my first attempt at capturing those points to be included  
> in the report introduction.
>
> Thanks,
> Dan
>
>
> Context:
>
> Speech technologies are available today from many software  
> manufacturers on a variety of platforms with implementations  
> allowing a wide selection of development tools and on multiple  
> device types. These speech technologies cover aspects of the user  
> interaction using spoken commands, dictation, text to speech and  
> speech to text recognition. While there are endless possibilities  
> and uses for speech technologies from accessibility to user  
> convenience in many fields like medical, education and others, there  
> is a great potential of adoption in the rapidly evolving realm of  
> web applications.
> HTML5 has brought the rich user experience to the web and  
> developments in the field of voice recognition allow real time user  
> to machine dialogue in the car, on the phone and at the desktop.  
> With the web moving beyond the desktop and the browser, it is  
> imperative necessary to find ways to streamline the process of  
> developing web applications and create interoperable speech APIs  
> that will work across multiple browsers and speech providers giving  
> the developers choice and consistency in implementing rich speech  
> enabled web applications.
>
> Goals:
>
> The goal for the HTML Speech incubator group is to identify and  
> document common requirements and use cases necessary to support the  
> standardization of  API(s) that will enable web developers to design  
> and deploy speech enabled web applications and provide the user with  
> a consistent experience across different platforms and devices  
> irrespective of the speech engine used. The outcome of the group's  
> findings should result in  design recommendation(s) that would  
> foster innovation and promote consistency by leveraging and  
> enhancing existing work in W3C as well as other standards bodies.
> The driver for the standard design will be a common and agreed upon  
> understanding of the elements and interactions necessary to create  
> an end to end multi modal user experience  and to avoid development  
> fragmentation in using HTML and JavaScript when developing  
> interoperable speech enabled web applications.
>
>
>
>
>
Received on Wednesday, 20 April 2011 14:52:55 UTC