Re: [HTML Speech] speech resource specification requirement from Satish Sampath on 2010-09-09 (public-xg-htmlspeech@w3.org from September 2010)

From: Satish Sampath <satish@google.com>
Date: Thu, 9 Sep 2010 18:20:59 +0100
To: "JOHNSTON, MICHAEL J (MICHAEL J)" <johnston@research.att.com>
Cc: "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
Message-ID: <AANLkTinVh=S_XUGwBspXqO7S94mgMudJ4bcgEK8FALT+@mail.gmail.com>

> Consider the following use case, a company, let's call them ACME Corp.
> wants to put out a speech enabled web page that allows users to search
> for their various products and services using voice. As part of their development
> effort, they build a language model that supports this task.  With HTML+Speech
> allowing specification of a speech resource on the network, they can serve the
> same speech enabled page to all desktop and mobile browsers
> supporting the standard.

Wouldn't it be sufficient to build a grammar based on the ACME product
list than a whole language model? After all ACME corp may not have the
resources or time to train with all possible voice variants and may
alienate users in the process. Whereas a UA which supports speech
recognition has the incentive to do it well enough to work for all web
pages and use cases.

> We now have a situation where users will have a different
> experience using speech input depending the browser, differing accuracy,
> possible differences in tokenization and normalization.

This would already be the case if the UA decides to select a local
recognizer instead of remote, per Eric's earlier proposal (whether it
is because the local recognizer is more tuned to his voice or for
bandwidth/speed reasons). I think we should let the UA decide the best
configuration for the user rather than the web developer, as other
APIs have done.

--
Cheers
Satish

Received on Thursday, 9 September 2010 17:21:29 UTC