Re: Microsoft SRGS Implementation Report from Qiru Zhou on 2002-09-05 (www-voice@w3.org from July to September 2002)

From: Qiru Zhou <qzhou@research.bell-labs.com>
Date: Wed, 04 Sep 2002 21:13:30 -0400
To: www-voice@w3.org
Message-ID: <3D76AFBA.4F49769A@research.bell-labs.com>
Stephen,

I found your report file "results.xml" is encoded in a none standard multi-byte
encoding (attribute said UTF-16) "Wang Taiwan" that can only be decoded by
editor can handle this encoding (I used MS Office-XP to get it decoded). I
guess that many of the VB WG members will have problem to read it. Could
you convert it to ASCII if there is no character from other language in this
file?

Thanks,

-- Qiru

> Date: Sat, 31 Aug 2002 00:58:06 +0100
> Message-ID: <9584A4A864BD8548932F2F88EB30D1C60722B62C@TVP-MSG-01.europe.corp.microsoft.com>
> From: "Stephen Potter" <spotter@microsoft.com>
> To: <www-voice@w3.org>
> Cc: <www-voice-wg@w3.org>
> Subject: Microsoft SRGS Implementation Report
> 
> Microsoft has implemented the XML Form of the Speech Recognition Grammar
> Specification (SRGS) in a developmental version of the Microsoft Speech
> API (SAPI). The results are attached.
> 
> SAPI is middleware which allows developers to use speech recognition
> and/or speech synthesis in their applications. To conduct the tests
> specified in the implementation report, a text-input interface was used
> with SAPI's core grammar processor. Given SAPI's ability to use the core
> grammar processor in speech, DTMF and text environments, we believe this
> successfully demonstrates coverage of SRGS in the following agents
> required by the Implementation Report Plan :
> 
>   - XML Grammar Processor for ASR
>   - XML Grammar Processor for DTMF
>   - XML text parser
> 
> Previous versions of SAPI may be found at
> http://www.microsoft.com/speech/.
> 
> As a founding member of the Speech Application Language Tags (SALT)
> Forum (http://www.saltforum.org) and an active member of the W3C Voice
> Browser and Multimodal Interaction Working Groups, Microsoft believes
> speech standards will play a key role in the growing market for speech
> applications. We consider SRGS a thorough, well-designed specification
> which, by providing a common syntax for speech recognition grammars,
> should help promote interoperability and portability in speech
> application development. Microsoft intends to support SRGS in its suite
> of forthcoming SALT products:
> 
>     .Net Speech SDK, a set of speech application development tools and
> speech controls integrated with Visual Studio(r) .NET that will make it
> faster and easier for web developers to incorporate speech into web
> applications (a Beta version of the SDK is available at
> http://www.microsoft.com/speech/getsdk);
>    - .Net Speech platform, an integrated multimodal and telephony
> platform for multiple clients such as PCs, telephones, wireless personal
> digital assistants (PDAs), and Tablet PCs.
> 
> As required by the Implementation Report plan, some technical details of
> the SAPI implementation are as follows:
> 
> 1. Relationship of SAPI output to LPS
> SAPI's XML representation of recognition output was mapped explicitly to
> the LPS defined in Appendix H by a recursive traversal of the parse
> tree. In some tests a complete mapping into LPS was not always possible,
> for example, the content of the tag element and the exact path of
> external rules are not copied directly. However, since these are minor
> aspects of only the surface form of LPS (itself an informative part of
> the specification) and they in no way affect the behaviour of the
> grammar processor as defined in the specification, we do not consider
> these an unsuccessful implementation of the tests.
> 
> 2. Weights and probabilities
> As noted in the Implementation Report Plan, pass/fail testing of weights
> on alternatives and probabilities on repeats is not possible. We believe
> these features are implementable and useful. We have implemented support
> for both weights and repeat probabilities into our ASR grammar
> processor. We believe that, when properly estimated, weights and repeat
> probabilities have a positive effect in maximizing recognizer
> performance.
> 
> 3. Language support
> Speech recognition engines were not available for certain languages
> required by the tests. For those tests where a single language is used
> in the grammar and a recognition engine was not available for that
> language, SAPI's text input mechanism was used with the grammar
> processor to compile the grammar, parse the input and produce successful
> output, and we have considered this a successful implementation.
> 
> 4. Test set version
> The SAPI implementation was run on the error-corrected set of tests
> circulated within the Voice Browser Working Group on 23 July (see
> http://lists.w3.org/Archives/Member/w3c-voice-wg/2002Jul/0043.html
> (members only)
> 
> 5. Unimplemented features
> As noted in the results of the tests, the following features of the
> specification have not been implemented.
>    - multiple languages within the same grammar
>       This appears to be a useful feature for certain deployment
> scenarios.
>    - base URI specification for rule reference (xml:base and meta base)
>       This appears to be a generally useful feature.
>    - lexicon content
>       Although we have implemented the syntax of <lexicon> (and thereby
> the correct test set behaviour), we have not implemented the semantics
> of lexicon look-up. The ability to specify pronunciations is clearly a
> very useful feature.
>    - repetition of tag elements equivalent to a single tag element
>       This is a peripheral feature, but we would like to seek guidance
> on its utility for the grammar developer.
> 
> Stephen Potter
> .Net Speech Technologies
> Microsoft Corporation
> 
> ----------------------------------------------------------------------------------------------------------------------
> 
>    * text/xml attachment: results.xml
> 
> -------------------------------------------------------------------------------------------------
Received on Wednesday, 4 September 2002 21:13:39 UTC