[siviws] Summary of the Workshop on Speaker biometrics and VoiceXML 3.0 from Kazuyuki Ashimura on 2009-06-23 (www-voice@w3.org from April to June 2009)

From: Kazuyuki Ashimura <ashimura@w3.org>
Date: Tue, 23 Jun 2009 23:20:41 +0900
To: www-voice <www-voice@w3.org>
Message-ID: <4A40E4B9.2070401@w3.org>
The following is the summary of the Speaker Biometrics Workshop held
on 5-6 March 2009 in Menlo Park, Califrornia, US.

We are very sorry it took so long for us to send this out.

---
Summary of the Workshop on Speaker biometrics and VoiceXML 3.0

On 5-6 March 2009, the W3C Voice Browser Working Group held a Workshop
on Speaker biometrics and VoiceXML 3.0 in Menlo Park, California, US,
hosted by SRI International.

The minutes of the workshop are available on the W3C Web server:
http://www.w3.org/2008/08/siv/minutes.html

Also the HTML version of this summary, which includes the figures of
(1) Resource control and distributed decision making and (2) Menlo
Park model, an SIV architecture, is available at:
http://www.w3.org/2008/08/siv/summary.html

There were 16 attendees from the following organizations:

     * SRI International
     * Recognition Technologies, Inc.
     * J. Markowitz, Consultants
     * Intervoice
     * EIG
     * Deutsche Telekom AG, Laboratories
     * Centrelink
     * iBiometrics, Inc.
     * Daon
     * Cisco Systems, Inc.
     * iBiometrics, Inc.
     * Voxeo
     * Nuance
     * General Motors
     * W3C

This workshop was narrowly focused on identifying and prioritizing
directions for SIV standards work as a means of making SIV more useful
in current and emerging markets. Topics discussed during the workshop
includes:

     * SIV use cases (Application requirements for SIV in VoiceXML 3.0)
     * SIV users (design philosophy, uncertainty, security, identity)
     * Audio formats for SIV (Wav, PCM, alaw, ulaw, OGG, etc.)
     * Data format for multimodal applications (EMMA, etc.)
     * SIV related standards other than W3C (CBEFF, INCITS 456, BIAS, BioAPI)
     * SIV and MRCP V2
     * Architecture and functionality (features, configuration, APIs, etc.)

During the workshop we have clarified "Why SIV functionality should be
added to VoiceXML" as follows.

     * The system would be more responsive, so VoiceXML could shorten
       customer perceived latency and provide performance benefits to
       the users.

     * It would be easier for developers to generate applications,
       because programming interface would be consistent with the way
       they use other VoiceXML resources [1] and low-level operations
       would be hidden to them.

       [1] http://www.w3.org/2008/08/siv/resource_control.png

     * Adding SIV to a standard would make it portable and facilitate
       integration with Web model, because it makes SIV applications
       consistent with the model and provide efficiencies of scale in
       hosted environment.

     * Standardizations of easy to use API would minimize vendor
       lock-in and grow the market.

     * Support in VoiceXML enables SIV use (without the application
       server) with intermittent/offline connectivity.

     * Standards are a sign of technology maturity.

The major "takeaway" is our confirming SIV fits into the VoiceXML
space and generating the "Menlo Park Model" [2], an SIV available
VoiceXML architecture.

[2] http://www.w3.org/2008/08/siv/MenloParkModel-v003.png

The discussion on the above "Menlo Park Model" includes:

    1. Main hidden security issues that people have concern about are
    idendified, and ways in which they can be realistically addressed
    are discussed. Those issues don't disappear but now we know we can
    address them.

    2. VoiceXML 3.0 could be an example of the Interaction Managers
    within the W3C's MMI Architecture. The synchronization and markup
    integration of multiple modalities should be addressed. There are
    likely to be multiple modalities/factors involved in an interaction
    using VoiceXML. Consequently, developers need a way to not
    completely separate those modalities.

    3. Collaboration with other W3C Working Groups and other standard
    bodies, e.g., OASIS/BIAS, is expected.

Note that the discussion during the workshop was mainly focused on the
application side and provided little guidance for engine providers. So
a good standard that would create a nice wrapper around speech engines
would be needed. Classification, segmentation and identification
should be also considered and the group needs to determine whether or
not to include them in VoiceXML V3.0.

Judith Markowitz, Ken Rehor and Kazuyuki Ashimura, Workshop co-Chairs

-- 
Kazuyuki Ashimura / W3C Multimodal & Voice Activity Lead
mailto: ashimura@w3.org
voice: +81.466.49.1170 / fax: +81.466.49.1171
Received on Tuesday, 23 June 2009 14:21:19 UTC