[SSML] Summary of third Internationalization Workshop from Daniel C. Burnett on 2007-02-01 (www-voice@w3.org from January to March 2007)

From: Daniel C. Burnett <Daniel.Burnett@nuance.com>
Date: Thu, 1 Feb 2007 14:22:16 -0500
To: <www-voice@w3.org>
Message-ID: <2AB5541EB33172459EE430FFB66B1EE9018D45B1@BN-EXCH01.nuance.com>

Summary of the third Workshop on Internationalizing SSML

---

 

On 13-14 January the Voice Browser Working Group held the third Workshop
on Internationalizing SSML in Hyderabad, India, hosted by Bhrigus and
IIIT Hyderabad.

 

The minutes of the workshop are available on the W3C Web server:
http://www.w3.org/2006/10/SSML/minutes.html

 

There were more than 15 attendees from India, Sri Lanka, Pakistan,
Japan, Italy, US, France, and the UK.

 

Motivation for internationalizing SSML includes:

* It is estimated that within 3 years the World Wide Web will contain
significantly more content from currently under-represented languages.

* There is great need for SSML to work for languages beyond those
supported by current version (=SSML 1.0).

* Some languages such as Mandarin Chinese or Hindi are difficult to
input via a telephone keypad.

* Many other languages would also benefit from a new "international"
version of SSML, and it would help spread the Web to places where it is
not so readily accessible.

 

This workshop was more narrowly focused than the previous workshops,
specifically targeting languages of the Indian subcontinent. Topics
discussed during the Workshop included:

* Language-specific issues (Echo expressions, word compounding,
optional/missing diacritics, ...)

* Alternative/mixed-language support (loan words, broader
language/dialect/script support, mixed language text)

* Pronunciation alphabets (non-IPA and syllable-based pronunciation
alphabets)

* Other items (proper name identification, say-as extensions)

 

The major "takeaways" are:

* Current work on SSML 1.1 will address many of the needs of Indian
language authors.

* Word compounds must be treatable as a single lexical unit

* Authors should be able to indicate when special, eg. expensive,
processing should occur, for example word segmentation or diacritic
restoration in Urdu.

* Authors should have control over processor behavior when a requested
voice can't speak given language content -- mechanisms proposed in the
first Working Draft of SSML 1.1 are still insufficient for proper
development of multilingual applications.

* Transliteration is common for Indian languages and is a transformation
that must be performed before text normalization.  Existing mechanisms
in SSML 1.0 are insufficient to address this.

 

We have started to review these new topics and will continue to do so as
we continue the work on SSML 1.1 in the next face-to-face meeting in
Beijing.

 

Daniel C. Burnett and Kazuyuki Ashimura, Workshop Co-chairs

Received on Thursday, 1 February 2007 19:22:29 UTC