RE: vocalization and BIDI in SSML (was: RE: Consolidated comments on SSML) from Laura Werner on 2003-06-10 (www-voice@w3.org from April to June 2003)

From: Laura Werner <laura@bevocal.com>
Date: Tue, 10 Jun 2003 09:53:42 -0700
To: "'David.Pawson@rnib.org.uk'" <David.Pawson@rnib.org.uk>, w3c-wai-pf@w3.org
Cc: www-voice@w3.org
Message-ID: <F73733CC5E11D41189E700D0B74A0C9E03001C70@exchange.bevocal.com>
> "As long as  there is a way to write the text,
> the engine can figure out how to speak it."
> produces jibberish in many cases.

I don't think things are quite so bad.  Since SSML is XML and XML text is
Unicode, the synthesizer should be able to assume that the text is in
logical order, not in written / physical order.  For example, when encoding
a right-to-left language like Hebrew in SSML, the first character in the
SSML will be the first one logically: the first letter of the first word to
be spoken.  The fact that this character would print in the upper right
corner of a page printed in Hebrew is orothogonal, I think.

This does bring up the interesting question of how you'd *edit* an XML page
with Hebrew text content in the tags.  I think you'd need a BiDi-aware
editor, treating the XML tags as LtR and the embedded Hebrew as RtL.  I'm
not sure what you'd set the overall document orientation to.  Maybe a user
preference?

Laura Werner
BeVocal VoiceXML architect
(and former Unicode maven)

-----Original Message-----
From: David.Pawson@rnib.org.uk [mailto:David.Pawson@rnib.org.uk]
Sent: Tuesday, June 10, 2003 12:04 AM
To: w3c-wai-pf@w3.org
Cc: www-voice@w3.org
Subject: RE: vocalization and BIDI in SSML (was: RE: Consolidated
comments on SSML)



I'm certainly not happy with the response below.
From our 3 year experience with synthetic speech it is blatantly clear
that "As long as  there is a way to write the text, the engine can figure
out
 how to speak it." produces jibberish in many cases.

This is the basis for the external 'speak as' file. The synth
can usually speak a word reasonably if 'taught' by such a 
method. 

Fine if the end user can glance at a piece of text, but a lot
more important if the audio is the only access the user has to information.


regards DaveP





Al wrote:
> I think we may want to consider how these responses fit with 
> accessibility.
> from Dan Burnett on behalf of Voice Browser WG:
> 
> -- Please quote this citation in follow-ups:
> http://www.w3.org/mid/ED834EE1FDD6C3468AB0F5569206E6E91AF1CF@M
> PB1EXCH02.nuance.com
> 
> ]
> 
> Dear Martin (and the Internationalization Working Group),

> [VBWG responses follow]
> 
> [1] Rejected.  We reject the notion that on principle this is
> more difficult for some languages.  For all languages supported
> by synthesis vendors today this is not a problem.  As long as
> there is a way to write the text, the engine can figure out
> how to speak it.  Given the lack of broad support by vendors
> for Arabic and Hebrew, we prefer not to include examples for
> those languages.
>  > General:
>  > [01]  For some languages, text-to-speech conversion is 
> more difficult
>  >        than for others. In particular, Arabic and Hebrew 
> are usually
>  >        written with none or only a few vowels indicated. Japanese
>  >        often needs separate indications for pronunciation.
>  >        It was no clear to us whether such cases were considered,
>  >        and if they had been considered, what the appropriate
>  >        solution was.
>  >        SSML should be clear about how it is expected to 
> handle these
>  >        cases, and give examples. Potential solutions we 
> came up with:
>  >        a) require/recommend that text in SSML is written in an
>  >        easily 'speakable' form (i.e. vowelized for Arabic/Hebrew,
>  >        or with Kana (phonetic alphabet(s)) for Japanese. (Problem:
>  >        displaying the text visually would not be 
> satisfactory in this
>  >        case); b) using <sub>; c) using <phoneme> (Problem: only
>  >        having IPA available would be too tedious on authors);
>  >        d) reusing some otherwise defined markup for this purpose
>  >        (e.g. <ruby> from http://www.w3.org/TR/ruby/ for Japanese);
>  >        e) creating some additional markup in SSML.
>  >

- 

NOTICE: The information contained in this email and any attachments is 
confidential and may be legally privileged. If you are not the 
intended recipient you are hereby notified that you must not use, 
disclose, distribute, copy, print or rely on this email's content. If 
you are not the intended recipient, please notify the sender 
immediately and then delete the email and any attachments from your 
system.

RNIB has made strenuous efforts to ensure that emails and any 
attachments generated by its staff are free from viruses. However, it 
cannot accept any responsibility for any viruses which are 
transmitted. We therefore recommend you scan all attachments.

Please note that the statements and views expressed in this email 
and any attachments are those of the author and do not necessarily 
represent those of RNIB.

RNIB Registered Charity Number: 226227

Website: http://www.rnib.org.uk
Received on Tuesday, 10 June 2003 12:53:46 UTC