W3C home > Mailing lists > Public > www-multimodal@w3.org > June 2007

FW: Re: [EMMA] i18n comment: Use of emma:lang

From: <johnston@research.att.com>
Date: Tue, 26 Jun 2007 08:31:10 -0400
Message-ID: <0C50B346CAD5214EA8B5A7C0914CF2A448575B@njfpsrvexg3.research.att.com>
To: <public-i18n-core@w3.org>, <www-multimodal@w3.org>




SUBSTANTIVE (xml:lang vs emma:lang)


Comment from the i18n review of:



Comment 2

At http://www.w3.org/International/reviews/0704-emma/

Editorial/substantive: S

Owner: RI


Location in reviewed document:

4.2.5 [http://www.w3.org/TR/2007/WD-emma-20070409/#s4.2.5]



It's not at all clear to us what the difference is between 

emma:lang and xml:lang, the relationship between them, or 

when we should use which. (It might help to create examples

that show the use of xml:lang as well as emma:lang.)


[[In order handle inputs involving multiple languages, such as through
code switching,

the emma:lang tag MAY contain several language identifiers separated by


This is definitely something you cannot do with xml:lang, but we are
wondering what 

is the value of doing it anyway. We are not sure what benefit it would




We address each of these two points in turn:


Point 1: ACCEPT Clarification of emma:lang vs xml:lang function 


The W3C multimodal working group accept that it is important to 

make clear the differences between the xml:lang and emma:lang

attributes and plan to add clarificatory text into the emma:lang

section in the next draft of the EMMA specification. The 

xml:lang and emma:lang attributes serve uniquely different and

equally important purposes. The role of xml:lang is to 

indicate the language used for content in an XML element or 

document.  In contrast, the emma:lang attribute is used to

indicate the language employed by a user when entering an 

input into a spoken or multimodal dialog system. Critically,

emma:lang annotates the language of the signal originating

from the user rather than the specific tokens used at a 

particular stage of processing. This is most clearly

illustrated through consideration of an example involving,

multiple stages of processing of a user input -- the primary

use of EMMA markup. Consider the following scenario:

EMMA is being used to represent three stages in the 

processing of a spoken input to an system for ordering

products. The user input is in Italian, after speech

recognition, the user input is first translated into 

English, then a natural language understanding system converts

the English translation into a product ID (which is not in any

particular language). Since the input signal is a user 

speaking Italian, the emma:lang will be emma:lang="it" on all of

these stages of processing. The xml:lang attribute, in contrast

will initial be "it", after translation the xml:lang will

be "en-US", and after language understanding "zxx", assuming the 

use of "zxx" to indicate non-linguistic content. 


The following table illustrates the relation between the

content in the EMMA document, the emma:lang and the xml:lang:



CONTENT:        emma:lang        xml:lang
processing stage


condizionatore   emma:lang="it"  xml:lang="it"                  result
from speech recognition

air conditioner emma:lang="it"     xml:lang="en"                result
from machine translation

id1456               emma:lang="it"  xml:lang="zxx"              result
from natural language understanding



The following are examples of EMMA documents corresponding to these

processing stages. Abbreviated to show the critical attributes for
discussion here.

Note that <transcription>, <translation>, and <understanding> are

namespace attributes, not part of the EMMA markup.




            <emma:interpretation emma:lang="it" emma:mode="voice"






            <emma:interpretation emma:lang="it" emma:mode="voice"

                        <translation xml:lang="en">air






            <emma:interpretation emma:lang="it" emma:mode="voice"





In order to make clear these differences we will add clarifying text and

to the specification.



Point 2: Clarification, multiple values in emma:lang:



In call center and other applications multilingual users provide 

inputs in which they switch input language in mid utterance. The

emma:lang in these cases needs to indicate that the language 

involved more than one language, e.g.


"quisiera hacer una collect call"


The emma:lang in this case would have value "sp en"



            <emma:interpretation emma:lang="sp en" emma:mode="voice"

                        <transcription>quisiera hacer una collect




In order to use xml:lang in this example perhaps an additional element

could be used, e.g. <span>. Would this work?



            <emma:interpretation emma:lang="sp en" emma:mode="voice"

                        <transcription xml:lang="sp">quisiera hacer una
<span xml:lang="en">collect call</span></transcription>


Received on Tuesday, 26 June 2007 12:33:20 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:06:34 UTC