RE: Recommendation for xml:lang expressions

We started with Spanish.  Reading [correctly] any date or currency amount that the database tells us is tough with professional audio.  With names, it isn't a big deal if we had to speak them in English.

Here are two examples:

05/05/1908
In English with professional audio is "May" "5th" "nineteen" "oh" "eight"
In Spanish with professional audio is "cinco" "de" "mayo" "mil" "novicientos" "ocho"

$1956
In English with professional audio is "one" "thousand" "nine" "hundred" "fifty" "six" "dollars"
In Spanish with professional audio is "mil" "novicientos" "cincuenta" "y" "seis" "dollares"

So you can see the difficulties with the algorithms.  There just isn't a straight forward correlation.

The platform that we're using doesn't yet support ASR, it uses DTMF only, so we aren't concerned with grammars at all at this point, though I'm sure it would be an item of interest for platforms that do support ASR.


From: Michael Bodell [mailto:bodell@247-inc.com]
Sent: Tuesday, May 27, 2014 4:23 PM
To: David Wright; www-voice@w3.org
Subject: RE: Recommendation for xml:lang expressions

I think it depends on how important the other languages are.  If your application is 99+% or 95+% or even maybe 90+% one language, and the other language is there because needed, but isn't a main focus, then if you record the currency amounts, numbers, dates, etc. in  that language then you can just use recorded audio too.  And if the TTS occasionally gets read (like on a name you don't record, say one outside the top 1000 names by frequency or whatever you record), live with the fact that it will be in your primary language, presumably English.  I think most people take that approach.  If 95% of your callers are English and if 95% of your non-English callers never hit TTS, and 95% of those who do can't really tell the difference between a short snippet of language appropriate TTS but in the wrong xml:lang when surrounded by recorded audio in their language, then you are only going to negatively impact 1 in 8000 callers.  And even those that you do will mostly still function and be able to use the application, even if doesn't shine the way you'd want it to.

If it is more balanced and important to do the multiple languages fully on par as equal citizens then it may be worth having two parallel applications.  The voice dialogs and retry orders and what not may be different when someone from a different language/culture approaches the application.  IME, doing this sort of thing is the exception though, not the common practice - at least for decently sized applications (if your application is essentially just a smart voice based network prompter that just asks a question then transfers the call to the appropriate line it is easy enough to have a version of this one dialogue sort of application in each language).

In both situations though (the common, good enough and uncommon parallel implementations), once you've recorded audio in the secondary language(s), the primary concern around making the applications multilingual is the grammars and tuning the grammars in all the foreign languages, not the TTS.  The grammars are going to impact people more IME.

From: David Wright [mailto:David.Wright@OntarioSystems.com]
Sent: Tuesday, May 27, 2014 12:52 PM
To: Michael Bodell; www-voice@w3.org<mailto:www-voice@w3.org>
Subject: RE: Recommendation for xml:lang expressions

Yes, we're only using TTS for reading currency amounts, dates, and names that come from a database.  We wrote an interesting javascript function that throws a bunch of professional audio together to read dates and currency in English, but started to look into the differences with other languages and realized we needed to give up and use TTS instead.

So it sounds like you are saying that most other VXML programmers that have these requirements are choosing to change the xml:lang property of the entire application in the <vxml> tag, and if I move away from subdialogs, this might not be so bad.  Is that correct?


From: Michael Bodell [mailto:bodell@247-inc.com]
Sent: Tuesday, May 27, 2014 3:45 PM
To: David Wright; www-voice@w3.org<mailto:www-voice@w3.org>
Subject: RE: Recommendation for xml:lang expressions

In most of the applications I'm familiar with people record basically all the audio so there is no TTS or nearly no TTS.  With so little TTS, they don't care as much about the language of the TTS voice for their secondary languages - if the string is two for English and deux for French in an application that is primarily French, then they don't care that it is a French voice that says two with a French accent (example made up since obviously all the numbers are easy enough to record - TTS tends to occur only in truly dynamic content like reading your email or text messages to you).

It is also worth noting that the xml:lang may well influence the grammar acoustic model too, so you may wish to be a little careful in changing it just for the prompts (of course it may be you want the exact same changes for the grammars too, as likely if you want to say a prompt in one language, you want to listen to a speaker of the same language - again to solve that issue platform extensions of which grammar language model/acoustic model to use might be a way to go - and here we definitely have applications that do this.).

From: David Wright [mailto:David.Wright@OntarioSystems.com]
Sent: Tuesday, May 27, 2014 12:13 PM
To: Michael Bodell; www-voice@w3.org<mailto:www-voice@w3.org>
Subject: RE: Recommendation for xml:lang expressions

Thanks for the response, Michael.  I expect that many many many VXMl scripts have business requirements that they change language mid-script based on the caller's selection.  So how are most other VXML developers normally handling this within the existing standard?  Are they choosing one of the options that you proposed?

From: Michael Bodell [mailto:bodell@247-inc.com]
Sent: Tuesday, May 27, 2014 3:02 PM
To: David Wright; www-voice@w3.org<mailto:www-voice@w3.org>
Subject: RE: Recommendation for xml:lang expressions

It is not obvious that VXML would want to redefine xml:lang directly since that comes from the XML specification directly (hence the xml namespace prefix).  See http://www.w3.org/TR/REC-xml/#sec-lang-tag for more on that.

Obviously VXML could define something like xmllangexpr attribute and have it defined to evaluate to an string which should then be interpreted as xml:lang, but that could be a little messy IMO.

There are a few different approaches that could be taken to solve your issue, including the if method you call out.  Other methods include:


-           using a server side solution and sending the TTS and language information to the server in an audio URL and having the server render the TTS for you into a normal audio file.

-          (platform extension) defining a VXML property to be the xml:lang on SSML fragments and/or default voice.  Using that property tag to set the language for your program.

-          (spec extension) allowing the value expr to write XML tags as well as text so the <value expr="'&lt;s xml:lang=\"' + callerXmlLang + '\"&gt;$' + thisAmount +'&lt;/s%gt;'"/> or some such similar string.  This was traditionally a feature request (using value to write SSML fragments and/or using value to write SRGS fragments) but for various reasons and much discussion it was generally decided that this sort of document.write like functionality was inappropriate in VXML 2.0/2.1.

From: David Wright [mailto:David.Wright@OntarioSystems.com]
Sent: Wednesday, May 21, 2014 11:45 AM
To: www-voice@w3.org<mailto:www-voice@w3.org>
Subject: Recommendation for xml:lang expressions

In a VXML script, I prompt the caller for language and get a result from a <field>.  I need a way to programmatically express to all additional <audio> tags in my application that the language selected by my caller (let's say 'es') is to be used for TTS.  The <s xml:lang='es'> works if I wanted to hand-code this stuff and have <if> blocks all over the place to see what language to play the audio in, but I'm too lazy to do this much work.  I would really like to have a new attribute in the <s>, <p> and <voice> tags that lets me pass an ecmascript expression that is evaluated and placed into the xml:lang property before sending it over to the SSML engine.

As it is right now, in order to read a dollar amount in either Spanish or English, depending on what the caller chose, I have to do this:

<if cond="callerXmlLang == 'es'">
       <prompt>
              <s xml:lang="es">
                     <value expr="'$' + thisAmount"/>
              </s>
       </prompt>
</if>
<if cond="callerXmlLang == 'en'">
       <prompt>
              <s xml:lang="en">
                     <value expr="'$' + thisAmount"/>
              </s>
       </prompt>
</if>

I also considered using javascript to change the xml:lang attribute on the <vxml> tag, but this would be painful to have to set within every <form> that is called via <subdialog>, since the application context is fresh everytime you use <subdialog>.

[cid:image001.jpg@01CF79CB.F0A91AD0]<http://tinyurl.com/ndzpd5y>
Join Ontario Systems at PowerUp 2014. Visit the website and register today.  Register today<http://tinyurl.com/qz6a57q>.
  

Attention: This message and all attachments are private and may contain
information that is confidential and privileged. If you received this
message in error, please notify the sender by reply email and delete
the message immediately.

Received on Tuesday, 27 May 2014 20:58:44 UTC