RE: Some prioritization from Robert Brown on 2011-01-15 (public-xg-htmlspeech@w3.org from January 2011)

From: Robert Brown <Robert.Brown@microsoft.com>
Date: Sat, 15 Jan 2011 01:36:05 +0000
To: Bjorn Bringert <bringert@google.com>
CC: "Young, Milan" <Milan.Young@nuance.com>, "Olli@pettay.fi" <Olli@pettay.fi>, "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
Message-ID: <113BCF28740AF44989BE7D3F84AE18DD19815260@TK5EX14MBXC116.redmond.corp.microsoft.>
>> The conclusion then is that the majority of speech web apps will not be valuable :-)
LOL... maybe ;-)

>> the default speech recognizer could automatically adapt its base language model to each web app, based on the speech input that it has gotten from all users of that web app.
Sure, a decent recognizer should adapt.  But adapting from nothing puts a lot of faith in the tolerance of users and developers, and the capabilities of the default recognizer.  Discovering a complete LM by transcribing out-of-grammar events will in theory work eventually, provided the user is willing to put up with lousy results until it adapts; AND the app gets a high volume of transactions; AND there's enough side-signal to reliably detect out-of-grammar events; AND the default recognizer is willing to spend the money supporting this for every random app that comes along (as opposed to the app developer just providing a base LM that can be adapted on).  Good luck discovering the ever lengthening tail of new items appearing in the catalog of an online retailer, and removing those that are no longer offered, just by capturing utterances.  Sure, if your all default SR vendors also happen to have world-class search engines that have already indexed the site, you at least have a base LM to start from.  But that restriction of SR vendors doesn't sound very reasonable to me.

-----Original Message-----
From: Bjorn Bringert [mailto:bringert@google.com] 
Sent: Friday, January 14, 2011 2:18 PM
To: Robert Brown
Cc: Young, Milan; Olli@pettay.fi; public-xg-htmlspeech@w3.org
Subject: Re: Some prioritization

Ah, so you are saying that all *valuable* speech web apps will need a web-app specified recognizer (and a lot of development resources). And I'm saying that the majority of *all* the web apps that use speech will use the default speech recognizer. The conclusion then is that the majority of speech web apps will not be valuable :-)

There is an alternative: the default speech recognizer could automatically adapt its base language model to each web app, based on the speech input that it has gotten from all users of that web app.
That would mean a constant quality improvement, with no effort from the web developer or the user. It would sacrifice some developer control though.

/Bjorn

On Fri, Jan 14, 2011 at 9:54 PM, Robert Brown <Robert.Brown@microsoft.com> wrote:
> I'm pretty sure I'm saying the opposite :)
>
> I'm saying is that I don't think you can build valuable speech apps without specialized skills or carefully selected technology.
>
> Sure, for simple control, any modern recognizer should be fine.
>
> But for most of the use cases you've listed (search, messaging, dictation, translation), the developer definitely cares what n-grams are being used, what trained them, the nuances of the recognizer that'll generate the results, and how I make my ngram available to the service provider.  E.g. If I'm a web site that provides a large catalog of movies for people to watch (Netflix, Hulu, Amazon, etc) I definitely care what training data went into the SLM, and I definitely care about tuning to the confidence of the particular recognizer I'm using, and the commercial & service terms under which I share my n-gram with the vendor providing the recognition service.
>
>
> -----Original Message-----
> From: Bjorn Bringert [mailto:bringert@google.com]
> Sent: Friday, January 14, 2011 1:43 PM
> To: Young, Milan
> Cc: Robert Brown; Olli@pettay.fi; public-xg-htmlspeech@w3.org
> Subject: Re: Some prioritization
>
> Robert: So you are saying that few organizations have the resources to build apps that need to use a specific speech recognizer? Doesn't that mean that most organizations, and thus most web apps, will not need to use a specific speech recognizer?
>
> Milan: The use cases that I listed (search, messaging, dictation, translation, simple control) have plenty of commercial applications.
> They should work ok with a high-quality default speech recognizer.
>
> I'm not denying the value in web-app specified speech services. I'm just saying that there are important use cases for default speech services, and that I believe that those are the ones that will give the most immediate benefit: making it possible for millions of web developers to easily create speech applications.
>
> I think that within the incubator group, we have different priorities for default vs web-app specified speech services. It doesn't seem like discussion is likely to change that. One possible outcome is that we work on two proposals, one for default speech services and one for web-app specified speech services, and allow them to move ahead at different pace depending on support from browser vendors etc. I have no idea whether that would be a good way to work though.
>
> /Bjorn
>
> On Fri, Jan 14, 2011 at 7:16 PM, Young, Milan <Milan.Young@nuance.com> wrote:
>> I agree with Robert.  If folks want to tinker with Voice systems, there are plenty of GUI tools and free services to get them started.
>>
>> I've taken it as a given that commercial applications (like the Google search demo) would be in a V1 release.  That's not possible in a portable manner without an application-specified network recognizer.
>>
>>
>> -----Original Message-----
>> From: Robert Brown [mailto:Robert.Brown@microsoft.com]
>> Sent: Friday, January 14, 2011 10:09 AM
>> To: Bjorn Bringert; Olli@pettay.fi
>> Cc: Young, Milan; public-xg-htmlspeech@w3.org
>> Subject: RE: Some prioritization
>>
>> I suspect the opposite is true.  The hardest part of speech development is grammar/SLM design & tuning, and few organizations have the expertise to do a good job there.  Those that do, will optimize around a specific service.
>>
>> -----Original Message-----
>> From: public-xg-htmlspeech-request@w3.org
>> [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Bjorn 
>> Bringert
>> Sent: Friday, January 14, 2011 3:20 AM
>> To: Olli@pettay.fi
>> Cc: Young, Milan; public-xg-htmlspeech@w3.org
>> Subject: Re: Some prioritization
>>
>> On Fri, Jan 14, 2011 at 11:05 AM, Olli Pettay <Olli.Pettay@helsinki.fi> wrote:
>>> On 01/13/2011 07:59 PM, Young, Milan wrote:
>>>>
>>>> Hello Olli,
>>>>
>>>> I'd be interested to know what sort of use case you have in mind 
>>>> that uses default speech services.
>>>
>>> I was mainly thinking about rather simple speech interfaces created 
>>> by individual web developers who want to try out new features.
>>> I would assume grammars created by them aren't that big.
>>> We really want comments about the API from the web developers.
>>>
>>> Also things like speech-enabled web search should be doable, and 
>>> message (email/twitter/sms) dictation.
>>>
>>> And speech enabled controls in a web page.
>>> "Go to the next article", "Read this article", etc.
>>>
>>>
>>>>  I have been under the impression that most real world apps have 
>>>> grammars that are either too large or too sensitive to be 
>>>> transported over the network.
>>>
>>> Well, there are no real world speech enabled web apps commonly 
>>> available, so we don't necessarily know what they will look like ;) 
>>> But sure, for many cases (complex dialogs, specialized search, etc.) 
>>> network engines would be need.
>>
>> I agree that default speech services are very important. Use cases that could be handled pretty well include search, messaging, dictation, translation, simple control. I think that the majority of speech web apps (as measured in number of apps) will use default speech services, because the developers do not have the resources to run their own speech services. I also think that this will account for the majority of the speech app usage, at least in the beginning.
>>
>> If simple speech web apps take off, I think that we will see a gradual increase in interest in more complex applications and in interest from organizations large enough to devote resources to running their own high-quality speech services.
>>
>> /Bjorn
>>
>>>> -----Original Message-----
>>>> From: public-xg-htmlspeech-request@w3.org
>>>> [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Olli 
>>>> Pettay
>>>> Sent: Thursday, January 13, 2011 7:14 AM
>>>> To: public-xg-htmlspeech@w3.org
>>>> Subject: Some prioritization
>>>>
>>>> Hi all,
>>>>
>>>> I may not be able to attend conference call today (if we have such).
>>>> But anyway, I started to prioritize requirements the way I think 
>>>> about them. Or more so, I picked up lower priority requirements and 
>>>> categorized them to 3 groups.
>>>> I don't know how we're going to prioritize requirements, but I 
>>>> guess it doesn't harm to send this kind of email so that you know 
>>>> what kind of specification proposal I'm expected to see later this year.
>>>>
>>>>
>>>> -------------
>>>> A bit lower priority:
>>>> FPR46. Web apps should be able to specify which voice is used for TTS.
>>>> FPR57. Web applications must be able to request recognition based 
>>>> on previously sent audio.
>>>>
>>>>
>>>> -------------
>>>> Low priority:
>>>> FPR28. Speech recognition implementations should be allowed to fire 
>>>> implementation specific events.
>>>> FPR31. User agents and speech services may agree to use alternate 
>>>> protocols for communication.
>>>> FPR48. Web application author must be able to specify a domain 
>>>> specific statistical language model.
>>>> FPR56. Web applications must be able to request NL interpretation 
>>>> based only on text input (no audio sent).
>>>>
>>>>
>>>> -------------
>>>> Something perhaps for V2 specification These requirements can be 
>>>> important, but to get at least something done soon we could perhaps 
>>>> leave these out from v1 specification.
>>>> Note, v2 specification could be developed simultaneously with v1.
>>>>
>>>> FPR7. Web apps should be able to request speech service different 
>>>> from default.
>>>> ...and because of that also the following requirements FPR11. If 
>>>> the web apps specify speech services, it should be possible to
>>>>
>>>> specify parameters.
>>>> FPR12. Speech services that can be specified by web apps must 
>>>> include network speech services.
>>>> FPR27. Speech recognition implementations should be allowed to add 
>>>> implementation specific information to speech recognition results.
>>>> FPR30. Web applications must be allowed at least one form of 
>>>> communication with a particular speech service that is supported in 
>>>> all UAs FPR33. There should be at least one mandatory-to-support 
>>>> codec that isn't encumbered with IP issues and has sufficient 
>>>> fidelity&  low bandwidth requirements.
>>>> FPR55. Web application must be able to encrypt communications to 
>>>> remote speech service.
>>>> FPR58. Web application and speech services must have a means of 
>>>> binding session information to communications.
>>>>
>>>>
>>>>
>>>>
>>>> -Olli
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Bjorn Bringert
>> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham 
>> Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
>>
>>
>>
>
>
>
> --
> Bjorn Bringert
> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham 
> Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
>
>



--
Bjorn Bringert
Google UK Limited, Registered Office: Belgrave House, 76 Buckingham Palace Road, London, SW1W 9TQ Registered in England Number: 3977902
Received on Saturday, 15 January 2011 01:36:41 UTC