Re: What is your opinion on Voice Recognition software? from Alan Cantor on 1999-02-06 (w3c-wai-ig@w3.org from January to March 1999)

From: Alan Cantor <acantor@oise.utoronto.ca>
Date: Fri, 5 Feb 1999 20:15:24 -0500 (EST)
To: w3c-wai-ig@w3.org
Message-ID: <Pine.SOL.3.91.990205192204.29163A-100000@tortoise>

> I cannot speak for LERNOUT, but a word of caution when purchasing DRAGON
> NATURALLYSPEAKING. Its abiltiy to function properly is dependent of the PC
> and configuration.  They have a compatibility list on their website.  Pay
> close attention to your PC model and especially the laptop model.  

The same caution applies to all voice input systems. The quality of the
sound card, microphone, and other components strongly affect performance. 

A proportion of all microphones are faulty to begin with. The Andreas
microphone that came with one product I purchased, for example, is
unusable, while the VXI mike that came with another product is excellent. 

> It is not true, load and use anywhere.

I have met a lot of people who have given up on voice recognition because
of unrealistic expectations about what the technology can deliver. The
state of the art, although much better than it was two years ago, is still
pretty crude. It does NOT work for everybody; voice input systems, in my
experience, work best for computer-savvy people who are willing to spend a
lot of time learning and adapting to the system's intricacies and
idiosyncrasies.  When I recommend a voice recognition system to a client,
I also recommend 12 to 20 hours of one-on-one training. The manufacturers
feed high expectations with claims, in their promo materials, of 100, 120
or even 150 words per minute. Sure, it's easy to achieve high speeds
during a demo, but most people who do real writing cannot match these
speeds. Writing is much more than laying down words; writing is also the
process of revising and clarifying one's ideas, of organizing ideas. Thus,
editing is integral to writing, and so the ease of editing must be
factored into an evaluation of the products. (And what does editing mean?
Editing by voice? Editing by keyboard? Editing by mouse? Within which
applications? One's preferred way of working affects how one evaluates.)

One of the paradoxes of evaluating these systems is that dictation speed
and dictation accuracy are inadequate measures of performance. Of the
current crop of voice recognition products, one gives excellent accuracy
when dictating, but poor accuracy when correcting misrecognitions by
voice. One product features ingenious commands that make it fairly easy to
edit documents and correct misrecognitions by voice; its main competitor
lacks commands of this type, but is much faster when editing/correcting
manually. One product has "intuitive" formatting commands; another does
not. One product features a remarkably easy way to select menus/toolbars
and activate buttons in dialog boxes, but as a consequence, frequently
mistakes words for menu or toolbar commands. When correcting certain kinds
of misrecognition errors by voice with one product, you say a single
command, while another product forces you to utter a series of three or
four commands... Etc. etc. 

So, now that I have rattled on at length, does the subject of voice
recognition speed/accuracy have anything to do with the business of
WAI-IG??? 

Alan

Received on Friday, 5 February 1999 20:17:18 UTC