- From: Satish Sampath <satish@google.com>
- Date: Thu, 9 Sep 2010 18:20:59 +0100
- To: "JOHNSTON, MICHAEL J (MICHAEL J)" <johnston@research.att.com>
- Cc: "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
> Consider the following use case, a company, let's call them ACME Corp. > wants to put out a speech enabled web page that allows users to search > for their various products and services using voice. As part of their development > effort, they build a language model that supports this task. With HTML+Speech > allowing specification of a speech resource on the network, they can serve the > same speech enabled page to all desktop and mobile browsers > supporting the standard. Wouldn't it be sufficient to build a grammar based on the ACME product list than a whole language model? After all ACME corp may not have the resources or time to train with all possible voice variants and may alienate users in the process. Whereas a UA which supports speech recognition has the incentive to do it well enough to work for all web pages and use cases. > We now have a situation where users will have a different > experience using speech input depending the browser, differing accuracy, > possible differences in tokenization and normalization. This would already be the case if the UA decides to select a local recognizer instead of remote, per Eric's earlier proposal (whether it is because the local recognizer is more tuned to his voice or for bandwidth/speed reasons). I think we should let the UA decide the best configuration for the user rather than the web developer, as other APIs have done. -- Cheers Satish
Received on Thursday, 9 September 2010 17:21:29 UTC