Re: Collection of requirements and use cases

Sure..I recongnize that both of us are aware of the scope and
roles of these respective groups...

I believe my comments are in-line with your stated objective of biting off
as much as we can practically chew from a HTML speech enablement 
perspective..
without embarking on a broader feature-set that other fully-commissioned
Working Groups are currently set to pursue..

Regards,
Raj
----- Original Message ----- 
From: "T.V Raman" <raman@google.com>
To: <raj@openstream.com>
Cc: <raman@google.com>; <mbodell@microsoft.com>; 
<public-xg-htmlspeech@w3.org>
Sent: Thursday, September 23, 2010 12:43 PM
Subject: Re: Collection of requirements and use cases


>
> Raj, I believe that the MMIWG and VBWG as fully-chartered WGs
> should continue developing the long-term work. If the XG is going
> to do what those two WGs are doing, then we might as well shut
> those down (not necessarily advocating that on this thread) --
> -
>
> Raj (Openstream) writes:
> > Great Job Michael..And excellent commentary by TVRaman as usual..
> >
> > During the deep-breath exercise that TVR suggested, I would also
> > add, at the risk of sounding trite, that the "simple & practical"
> > alternatives
> > did not work either over the last 10 years resulting in where we are
> > today,
> > and I am afraid, any "quick & practical" approach will result in
> > greater fragmentation of development, which I am sure we are all
> > trying to avoid here..
> >
> > Perhaps, it would make it easier to highlight the requirement
> > and link the illustrative example-use-cases to make an easy reading
> > of the set/union of must-haves for the feature-set as TVR suggest
> > for a realistic initial set of XG devlierables.
> >
> >
> > Regards,
> > Raj
> >
> >
> > On Thu, 23 Sep 2010 08:22:40 -0700
> >   raman@google.com (T.V Raman) wrote:
> > >
> > > Good job Michael!
> > >
> > > Next step -- at this point we've pooled all the requirements of
> > > the last 10 + years of the MMIWG, plus a few additional ones to
> > > boot from VBWG.
> > >
> > > Now, given that those have not been addressed by a single
> > > solution in 10+ years,  I believe it would be both naive and
> > > extremely egotistic of this XG  to  try to address all of them at
> > > one fell swoop -- we'll be here another 10 years -- during which
> > > time the requirements will only increase.
> > >
> > > I urge everyone to take a deep breath, then proceed in small
> > > practical steps toward building things that the Web needs today.
> > >
> > > Michael Bodell writes:
> > > > In order to make more structured progress on addressing all the
> > >requirements and use cases sent to the list I’ve collated them into
> > >one comprehensive set in
> > > > the order they were received.  If anyone has use case or
> > >requirements that they didn’t send yet or that they don’t see in this
> > >list please send them by Monday
> > > > the 27^th.  I've tried to be exhaustive here to be complete and
> > >fair and not worried at all if some of the requirements are similar
> > >to one another and if
> > > > other requirements are exact opposites.  I’ll work on a more
> > >organized representation of both additional information sent and this
> > >list of requirements and
> > > > use cases next week.
> > > >
> > > > 1.       Web search by voice:  Speak a search query, and get
> > >search results. [1]
> > > >
> > > > 2.       Speech translation: The app works as an interpreter
> > >between two users that speak different languages. [1]
> > > >
> > > > 3.       Speech-enabled webmail client, e.g. for in-car use. Reads
> > >out e-mails and listens for commands, e.g. "archive", "star", "reply,
> > >ok, let's meet at 2
> > > > pm", "forward to bob". [1]
> > > >
> > > > 4.       Speech shell:  Allows multiple comments, most of which
> > >take arguments, some of which are free-form. E.g. "call <number>",
> > >"call <contact>",
> > > > "calculate <arithmetic expression>", "search for <query>".. [1]
> > > >
> > > > 5.       Turn-by-turn navigation:  Speaks driving instructions,
> > >and accepts spoken commands, e.g. "navigate to <address>", "navigate
> > >to <contact name>",
> > > > "navigate to <business name>", "reroute", "suspend navigation".
> > >[1]
> > > >
> > > > 6.       Dialog systems, e.g. flight booking, pizza ordering. [1]
> > > >
> > > > 7.       Multimodal interaction:  Say "I want to go here", and
> > >click on a map. [1]
> > > >
> > > > 8.       VoiceXML interpreter:  Fetches a VoiceXML app using
> > >XMLHttpRequest, and interprets it using JavaScript and DOM. [1]
> > > >
> > > > 9.       The HTML+Speech standard must allow specification of the
> > >speech resource (e.g. speech recognizer) to be used for processing of
> > >the audio collected
> > > > from the user. [2]
> > > >
> > > > 10..   The ability to switch between a grammar based recognition
> > >to free form recognition. [3]
> > > >
> > > > 11..   Ability to specify the field relationships. For example
> > >when a country field is selected, the state field selections change,
> > >so corresponding grammar/
> > > > choices should also be changed. [3]
> > > >
> > > > 12..   The API must notify the web app when a spoken utterance has
> > >been recognized. [4]
> > > >
> > > > 13..   The API must notify the web app on speech recognition
> > >errors. [4]
> > > >
> > > > 14..   The API should provide access to a list of speech
> > >recognition hypotheses. [4]
> > > >
> > > > 15..   The API should allow, but not require, specifying a grammar
> > >for the speech recognizer to use. [4]
> > > >
> > > > 16..   The API should allow specifying the natural language in
> > >which to perform speech recognition. This will override the language
> > >of the web page. [4]
> > > >
> > > > 17..   For privacy reasons, the API should not allow web apps
> > >access to raw audio data but only provide recognition results. [4]
> > > >
> > > > 18..   For privacy reason, speech recognition should only be
> > >started in response to user action. [4]
> > > >
> > > > 19..   Web app developers should not have to run their own speech
> > >recognition services. [4]
> > > >
> > > > 20..   Provide temporal structure of synthesized speech.  E.g., to
> > >highlight the word in a visual rendition of the speech, to
> > >synchronize with other
> > > > modalities in a multimodal presentation, to know when to interrupt
> > >[5]
> > > >
> > > > 21..   Allow streaming for longer stretches of spoken output. [5]
> > > >
> > > > 22..   Use full SSML features including gender, language,
> > >pronunciations, etc. [5]
> > > >
> > > > 23..   Web app developers should not be excluded from running
> > >their own speech recognition services. [6]
> > > >
> > > > 24..   End users should not be prevented from creating or extend
> > >existing grammars on both a global and per application basis. [6]
> > > >
> > > > 25..   End-user extensions should be accessible either from the
> > >desktop or from the cloud. [6]
> > > >
> > > > 26..   For reasons of privacy, the user should not be forced to
> > >store anything about their speech recognition environment on the
> > >cloud. [6]
> > > >
> > > > 27..   Any public interfaces for creating extensions should be
> > >"speakable". [6]
> > > >
> > > > 28..   TTS in Speech translation: The app works as an interpreter
> > >between two users that speak different languages. [7]
> > > >
> > > > 29..   TTS in Speech-enabled webmail client, e.g. for in-car use.
> > > Reads out e-mails and listens for commands, e.g. "archive", "star",
> > >"reply, ok, let's meet
> > > > at 2 pm", "forward to bob". [7]
> > > >
> > > > 30..   TTS in Turn-by-turn navigation:  Speaks driving
> > >instructions, and accepts spoken commands, e.g. "navigate to
> > ><address>", "navigate to <contact name>",
> > > > "navigate to <business name>", "reroute", "suspend navigation".
> > >[7]
> > > >
> > > > 31..   TTS in Dialog systems, e.g. flight booking, pizza ordering.
> > >[7]
> > > >
> > > > 32..   TTS in VoiceXML interpreter:  Fetches a VoiceXML app using
> > >XMLHttpRequest, and interprets it using JavaScript and DOM. [7]
> > > >
> > > > 33..   A developer creating a (multimodal) interface combining
> > >speech input with graphical output needs to have the ability to
> > >provide a consistent user
> > > > experience not just for graphical elements but also for voice. [8]
> > > >
> > > > 34..   Hello world example. [9]
> > > >
> > > > 35..   Basic VCR-like text reader example. [9]
> > > >
> > > > 36..   Free-form collector example. [9]
> > > >
> > > > 37..   Grammar-based collector example. [9]
> > > >
> > > > 38.  User-selected recognizer. [10]
> > > >
> > > > 39.  User-controlled speech parameters. [10]
> > > >
> > > > 40.  Make it easy to integrate input from different modalities.
> > >[10]
> > > >
> > > > 41.  Allow an author to specify an application-specific
> > >statistical language model. [10]
> > > >
> > > > 42.  Make the use of speech optional. [10]
> > > >
> > > > 43.  Support for completely hands-free operation. [10]
> > > >
> > > > 44.  Make the standard easy to extend. [10]
> > > >
> > > > 45.  Selection of the speech engine should be a user-setting in
> > >the browser, not a Web developer setting. [11]
> > > >
> > > > 46.  It should be possible to specify a target TTS engine not only
> > >via the "URI" attribute, but via a more generic "source" attribute,
> > >which can point to a
> > > > local TTS engine as well. [12]
> > > >
> > > > 47.  TTS should provide the user, or developer, with finer
> > >granularity in control over the text segments being synthesized. [13]
> > > >
> > > > 48.  Interacting with multiple input elements. [14]
> > > >
> > > > 49.  Interacting without visible input elements. [14]
> > > >
> > > > 50.  Re-recognition. [14]
> > > >
> > > > 51.  Continuous recognition. [14]
> > > >
> > > > 52.  Voice activity detection. [14]
> > > >
> > > > 53.  Minimize user perceived latency. [14]
> > > >
> > > > 54.  High quality default, but application customizable, speech
> > >recognition graphical user interface. [14]
> > > >
> > > > 55.  Rich recognition results allowing analysis and compex
> > >expression (I.e., confidence, alternatives, structured output). [14]
> > > >
> > > > 56.  Ability to specify domain specific grammars. [14]
> > > >
> > > > 57.  Web author able to write one speech experience that performs
> > >identically across user agents and/or devices. [14]
> > > >
> > > > 58.  Sythesis that is synchronized with other media (particular
> > >visual display). [14]
> > > >
> > > > 59.  Ability to effect barge-in (interrupt sythesis). [14]
> > > >
> > > > 60.  Ability to mitigate false-barge-in scenarios. [14]
> > > >
> > > > 61.  Playback controls (repeat, skip forward, skip backwards, not
> > >just by time but by spoken language segments like words, sentences,
> > >and paragraphs). [14]
> > > >
> > > > 62.  A user agent needs to provide clear indication to the user
> > >whenever it is using a microphone to listen to the user. [14]
> > > >
> > > > 63.  Ability of users to explicitly grant permission for the
> > >browser, or an application, to listen to them. [14]
> > > >
> > > > 64.  Needs to be a way to have a trust relationship between the
> > >user and whatever processes their utterance. [14]
> > > >
> > > > 65.  Any user agent should work with any vendor's speech services,
> > >provided it meets specific open protocol requirements. [14]
> > > >
> > > > 66.  Grammars, TTS and media composition, and recognition results
> > >should use standard formats (e.g. SRGS, SSML, SMIL, EMMA). [14]
> > > >
> > > > 67.  Ability to specify service capabilities and hints. [14]
> > > >
> > > > 68.  Ability to enable multiple languages/dialects for the same
> > >page. [15]
> > > >
> > > > 69.  It is critical that the markup support specification of a
> > >network speech resource to be used for recognition or synthesis. [16]
> > > >
> > > > 70.  End users need a way to adjust properties such as timeouts.
> > >[17]
> > > >
> > > > References:
> > > >
> > > > 1 -
> > >http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0001.html
> > >[referencing https://docs.google.com/View?id=dcfg79pz_5dhnp23f5 and
> > >repeated in
> > > >
> > >http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0043.html]
> > > >
> > > > 2 -
> > >http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0007.html
> > > >
> > > > 3 -
> > >http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0011.html
> > > >
> > > > 4 -
> > >http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0012.html
> > > >
> > > > 5 -
> > >http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0014.html
> > > >
> > > > 6 -
> > >http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0015.html
> > > >
> > > > 7 -
> > >http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0018.html
> > >[referencing http://docs.google.com/View?id=dcfg79pz_4gnmp96cz]
> > > >
> > > > 8 -
> > >http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0024.html
> > > >
> > > > 9 -
> > >http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0029.html
> > > >
> > > > 10 -
> > >http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0032.html
> > > >
> > > > 11 -
> > >http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0035.html
> > > >
> > > > 12 -
> > >http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0041.html
> > > >
> > > > 13 -
> > >http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0044.html
> > > >
> > > > 14 -
> > >http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0046.html
> > > >
> > > > 15 -
> > >http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0047.html
> > > >
> > > > 16 -
> > >http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0048.html
> > > >
> > > > 17 -
> > >http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Sep/0049.html
> > > >
> > >
> > > -- 
> > > Best Regards,
> > > --raman
> > >
> > > Title:  Research Scientist
> > > Email:  raman@google.com
> > > WWW:    http://emacspeak.sf.net/raman/
> > > Google: tv+raman
> > > GTalk:  raman@google.com
> > > PGP:    http://emacspeak.sf.net/raman/raman-almaden.asc
> > >
> >
> > --
> > NOTICE TO RECIPIENT:
> > THIS E-MAIL IS  MEANT FOR ONLY THE INTENDED RECIPIENT OF THE 
> > TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGED BY LAW.  IF YOU 
> > RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, 
> > DISTRIBUTION, OR COPYING OF THIS E-MAIL IS STRICTLY PROHIBITED.  PLEASE 
> > NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE 
> > THIS MESSAGE FROM YOUR SYSTEM. THANK YOU IN ADVANCE FOR YOUR 
> > COOPERATION.
> > Reply to : legal@openstream.com
>
> -- 
> Best Regards,
> --raman
>
> Title:  Research Scientist
> Email:  raman@google.com
> WWW:    http://emacspeak.sf.net/raman/
> Google: tv+raman
> GTalk:  raman@google.com
> PGP:    http://emacspeak.sf.net/raman/raman-almaden.asc
> 

Received on Thursday, 23 September 2010 17:19:34 UTC