RE: grammar that matches arbitrary word?

I'm not sure whether it is appropriate to discuss this on
the W3C mailinglist, but since this is an issue that many
developers run into at some point I am offering my
response here: 

The whole purpose of a grammar is to constrain the possible
input, thus reducing the search space and making speech
recognition feasible. Most recognizers work in a way that
evaluate different hypotheses based on this grammar and
simply return one or more best matches to that grammar.
In my unterstanding, dictation software basically has a
huge grammar that attempts to cover an entire language or
domain. (GRXML only supports a very simple grammar format.) 

It is possible and feasible to build quite large grammars
with thousands of names in them which should be
much easier and give you better results. It works reasonably
well with Nuance, I personally haven't tested it with
other recognition engines, but I hear that Phonetic Systems
was designed for very large grammars, particularly for
voice dialing since that is one of their target markets,
and may handle these better than others. 

What you probably want to do is either to cover the most
common names and let the application bail out to an operator
if it fails, or generate a list of names that will be used
from some other source of data (like a corporate directory,
or a personal phonebook). 

Still, I think there is a way to use arbitrary words
in grammars. Nuance will process grammars with a special
"filler" word that matches anything (I don't remember
the syntax right now). Of course, you won't get the text
string of a name as a return, but the recognition engine
will at least do endpointing and possibly word segmentation
for you, so you can for instance extract that word from a
recording and then process it with a tool that was specifically
made for name recognition. (I believe the API will return
where the word starts and ends in the time space, not sure
whether this is documented though.) Still, you will only get
a recording of the name and not its textual representation
from the main recognition engine so this is not very effective.
(Still useful though, where you want to reduce call center
costs and only have those segments processed by humans that
are cannot be substituted with ASR. Think of an operator
that listens only to names and types them in. This is surely
outside of the scope of VXML as of now, however.) 

There might be a way to get a string of phonemes returned
from the recognition engine and use that for approximate
matching, but it seems sort of counterintuitive to me that
a hidden markov chain based recognition engine (most
commercial ones are of that kind) would do that for you.
I'd appreciate any input from people who have done or tried
this though. 

Regards, 

 --florian 

 ----
Florian M Unterkircher <florian(at)unterkircher(dot)com>
Ph. +43 (676) 613-5147 (mobile)
PGP key fingerprint: AFCC B19F 6373 08F0 FEB6  16AC 956F F549 6DCA C173 


> -----Original Message-----
> From: www-voice-request@w3.org 
> [mailto:www-voice-request@w3.org] On Behalf Of Markku Savela
> Sent: Tuesday, February 12, 2002 9:11 AM
> To: www-voice@w3.org
> Subject: grammar that matches arbitrary word? 
> 
> 
> hi, 
> 
> To test my browser, I thougth to write a simple phone directory
> application: ask a name and perfom a query to database. 
> 
> However, how do I do this? 
> 
> ...
> <field name="name">
>   <prompt>Whose phone number you want to know?</prompt>
>   <grammar>
>    ??? what is the grammar to accept any name ???
>   </grammar>
> </field>
> ... 
> 
> Or how this type of application is supposed to be modelled? 
> Although, you could generate a grammar from database 
> 
>  <one-of>
>   <item> name1 </item>
>   <item> name2 </item>
>   ...
>   ...
>   <item> name589392 </item>
>  <one-of> 
> 
> I don't think this a solution...

Received on Tuesday, 12 February 2002 13:54:33 UTC