- From: <florian@mail.com>
- Date: Tue, 12 Feb 2002 21:41:37 +0100
- To: www-voice@w3.org
- Cc: Markku.Savela@vtt.fi
I'm not sure whether it is appropriate to discuss this on the W3C mailinglist, but since this is an issue that many developers run into at some point I am offering my response here: The whole purpose of a grammar is to constrain the possible input, thus reducing the search space and making speech recognition feasible. Most recognizers work in a way that evaluate different hypotheses based on this grammar and simply return one or more best matches to that grammar. In my unterstanding, dictation software basically has a huge grammar that attempts to cover an entire language or domain. (GRXML only supports a very simple grammar format.) It is possible and feasible to build quite large grammars with thousands of names in them which should be much easier and give you better results. It works reasonably well with Nuance, I personally haven't tested it with other recognition engines, but I hear that Phonetic Systems was designed for very large grammars, particularly for voice dialing since that is one of their target markets, and may handle these better than others. What you probably want to do is either to cover the most common names and let the application bail out to an operator if it fails, or generate a list of names that will be used from some other source of data (like a corporate directory, or a personal phonebook). Still, I think there is a way to use arbitrary words in grammars. Nuance will process grammars with a special "filler" word that matches anything (I don't remember the syntax right now). Of course, you won't get the text string of a name as a return, but the recognition engine will at least do endpointing and possibly word segmentation for you, so you can for instance extract that word from a recording and then process it with a tool that was specifically made for name recognition. (I believe the API will return where the word starts and ends in the time space, not sure whether this is documented though.) Still, you will only get a recording of the name and not its textual representation from the main recognition engine so this is not very effective. (Still useful though, where you want to reduce call center costs and only have those segments processed by humans that are cannot be substituted with ASR. Think of an operator that listens only to names and types them in. This is surely outside of the scope of VXML as of now, however.) There might be a way to get a string of phonemes returned from the recognition engine and use that for approximate matching, but it seems sort of counterintuitive to me that a hidden markov chain based recognition engine (most commercial ones are of that kind) would do that for you. I'd appreciate any input from people who have done or tried this though. Regards, --florian ---- Florian M Unterkircher <florian(at)unterkircher(dot)com> Ph. +43 (676) 613-5147 (mobile) PGP key fingerprint: AFCC B19F 6373 08F0 FEB6 16AC 956F F549 6DCA C173 > -----Original Message----- > From: www-voice-request@w3.org > [mailto:www-voice-request@w3.org] On Behalf Of Markku Savela > Sent: Tuesday, February 12, 2002 9:11 AM > To: www-voice@w3.org > Subject: grammar that matches arbitrary word? > > > hi, > > To test my browser, I thougth to write a simple phone directory > application: ask a name and perfom a query to database. > > However, how do I do this? > > ... > <field name="name"> > <prompt>Whose phone number you want to know?</prompt> > <grammar> > ??? what is the grammar to accept any name ??? > </grammar> > </field> > ... > > Or how this type of application is supposed to be modelled? > Although, you could generate a grammar from database > > <one-of> > <item> name1 </item> > <item> name2 </item> > ... > ... > <item> name589392 </item> > <one-of> > > I don't think this a solution...
Received on Tuesday, 12 February 2002 13:54:33 UTC