Microsoft SRGS Implementation Report

Microsoft SRGS Implementation Report
=============================
(This report corresponds to the updated SRGS test set found at
http://www.w3.org/2002/06/srgs-irp/ and supersedes Microsoft's previous
SRGS Implementation Report.)

Microsoft has implemented the XML Form of the Speech Recognition Grammar
Specification (SRGS) in a developmental version of the Microsoft Speech
API (SAPI). 

SAPI allows developers to use speech recognition and/or speech synthesis
in their applications. To conduct the tests specified in the
implementation report, a text-input interface was used with SAPI's core
grammar processor. Given SAPI's ability to use the core grammar
processor in speech, DTMF and text environments, we believe this
successfully demonstrates coverage of SRGS in the following agents
required by the Implementation Report Plan : 

- XML Grammar Processor for ASR 
- XML Grammar Processor for DTMF 
- XML text parser 

Previous versions of SAPI may be found at
http://www.microsoft.com/speech/. 

As a founding member of the Speech Application Language Tags (SALT)
Forum (http://www.saltforum.org) and an active member of the W3C Voice
Browser and Multimodal Interaction Working Groups, Microsoft believes
speech standards will play a key role in the growing market for speech
applications. We consider SRGS a thorough, well-designed specification
which, by providing a common syntax for speech recognition grammars,
should help promote interoperability and portability in speech
application development. Microsoft intends to support SRGS in its suite
of forthcoming SALT products: 

- .Net Speech SDK, a set of speech application development tools and
speech controls integrated with Visual Studio.NET that enables rapid
incorporation of speech into web applications (a Beta 2 version of the
SDK is available at http://www.microsoft.com/speech/); 

- .Net Speech platform, an integrated multimodal and telephony platform
for multiple clients such as PCs, telephones, wireless personal digital
assistants (PDAs), and Tablet PCs. 

As required by the Implementation Report plan, some technical details of
the SAPI implementation are as follows: 

1. Relationship of SAPI output to LPS 
SAPI's XML representation of recognition output was mapped explicitly to
the LPS defined in Appendix H by a recursive traversal of the parse
tree. In some tests a complete mapping into LPS was not always possible,
for example, the content of the tag element and the exact path of
external rules are not copied directly. However, since these are minor
aspects of only the surface form of LPS (itself an informative part of
the specification) and they in no way affect the behaviour of the
grammar processor as defined in the specification, we do not consider
these an unsuccessful implementation of the tests. 

2. Weights and probabilities
As noted in the Implementation Report Plan, pass/fail testing of weights
on alternatives and probabilities on repeats is not possible. We believe
these features are implementable and useful. We have implemented support
for both weights and repeat probabilities into our ASR grammar
processor. We believe that, when properly estimated, weights and repeat
probabilities have a positive effect in maximizing recognizer
performance. 

3. Language support
Speech recognition engines were not available for certain languages
required by the tests. For those tests where a single language is used
in the grammar and a recognition engine was not available for that
language, SAPI's text input mechanism was used with the grammar
processor to compile the grammar, parse the input and produce successful
output, and we have considered this a successful implementation. 

4. Test set version
The SAPI implementation was run on the set of tests posted at:
http://www.w3.org/2002/06/srgs-irp/ on 24 October 2002 (see
http://lists.w3.org/Archives/Public/www-voice/2002OctDec/0039.html for
details). 

5. Unimplemented features 
As noted in the results of the tests, the following features of the
specification have not been implemented: 

- multiple languages within the same grammar
This appears to be a useful feature for certain deployment scenarios. 

- lexicon content 
Although we have implemented the syntax of <lexicon> (and thereby the
correct test set behaviour), we have not implemented the semantics of
lexicon look-up. The ability to specify pronunciations is clearly a very
useful feature. 

- repetition of tag elements equivalent to a single tag element
(tag-repetition.grxml.) 
In our belief, where tag elements are repeated without input tokens,
this is a developer error, and SRGS processors should not be required to
resolve the repetition to a single tag. 


Stephen Potter
.Net Speech Technologies
Microsoft Corporation

Received on Friday, 13 December 2002 17:31:31 UTC