Re: [EMMA] i18n comment: Use of emma:lang from johnston@research.att.com on 2007-06-26 (www-multimodal@w3.org from June 2007)

From: <johnston@research.att.com>
Date: Tue, 26 Jun 2007 08:25:02 -0400
To: <www-multimodal@w3.org>
Message-ID: <0C50B346CAD5214EA8B5A7C0914CF2A4485759@njfpsrvexg3.research.att.com>
I18N-2 ACCEPT 

====================================================================

http://lists.w3.org/Archives/Public/www-multimodal/2007May/0005.html 

SUBSTANTIVE (xml:lang vs emma:lang)

 

Comment from the i18n review of:

http://www.w3.org/TR/2007/WD-emma-20070409/

 

Comment 2

At http://www.w3.org/International/reviews/0704-emma/

Editorial/substantive: S

Owner: RI

 

Location in reviewed document:

4.2.5 [http://www.w3.org/TR/2007/WD-emma-20070409/#s4.2.5]

 

Comment: 

It's not at all clear to us what the difference is between 

emma:lang and xml:lang, the relationship between them, or 

when we should use which. (It might help to create examples

that show the use of xml:lang as well as emma:lang.)

 

[[In order handle inputs involving multiple languages, such as through
code switching,

the emma:lang tag MAY contain several language identifiers separated by
spaces.]]

 

This is definitely something you cannot do with xml:lang, but we are
wondering what 

is the value of doing it anyway. We are not sure what benefit it would
provide.

 

RESPONSE:

 

We address each of these two points in turn:

 

Point 1: ACCEPT Clarification of emma:lang vs xml:lang function 

 

The W3C multimodal working group accept that it is important to 

make clear the differences between the xml:lang and emma:lang

attributes and plan to add clarificatory text into the emma:lang

section in the next draft of the EMMA specification. The 

xml:lang and emma:lang attributes serve uniquely different and

equally important purposes. The role of xml:lang is to 

indicate the language used for content in an XML element or 

document.  In contrast, the emma:lang attribute is used to

indicate the language employed by a user when entering an 

input into a spoken or multimodal dialog system. Critically,

emma:lang annotates the language of the signal originating

from the user rather than the specific tokens used at a 

particular stage of processing. This is most clearly

illustrated through consideration of an example involving,

multiple stages of processing of a user input -- the primary

use of EMMA markup. Consider the following scenario:

EMMA is being used to represent three stages in the 

processing of a spoken input to an system for ordering

products. The user input is in Italian, after speech

recognition, the user input is first translated into 

English, then a natural language understanding system converts

the English translation into a product ID (which is not in any

particular language). Since the input signal is a user 

speaking Italian, the emma:lang will be emma:lang="it" on all of

these stages of processing. The xml:lang attribute, in contrast

will initial be "it", after translation the xml:lang will

be "en-US", and after language understanding "zxx", assuming the 

use of "zxx" to indicate non-linguistic content. 

 

The following table illustrates the relation between the

content in the EMMA document, the emma:lang and the xml:lang:

 

------------------------------------------------------------------------
--------------------------

CONTENT:        emma:lang        xml:lang
processing stage

------------------------------------------------------------------------
--------------------------

condizionatore   emma:lang="it"  xml:lang="it"                  result
from speech recognition

air conditioner emma:lang="it"     xml:lang="en"                result
from machine translation

id1456               emma:lang="it"  xml:lang="zxx"              result
from natural language understanding

 

 

The following are examples of EMMA documents corresponding to these
three

processing stages. Abbreviated to show the critical attributes for
discussion here.

Note that <transcription>, <translation>, and <understanding> are
application

namespace attributes, not part of the EMMA markup.

 

 

<emma:emma>

            <emma:interpretation emma:lang="it" emma:mode="voice"
emma:medium="acoustic"> 

                        <transcription
xml:lang="it">condizionatore</transcription>

            </emma:interpretation>

</emma:emma>

 

<emma:emma>

            <emma:interpretation emma:lang="it" emma:mode="voice"
emma:medium="acoustic"> 

                        <translation xml:lang="en">air
conditioner</translation>

            </emma:interpretation>

</emma:emma>

 

 

<emma:emma>

            <emma:interpretation emma:lang="it" emma:mode="voice"
emma:medium="acoustic"> 

                        <understanding
xml:lang="zxx">id1456</understanding>

            </emma:interpretation>

</emma:emma>

 

In order to make clear these differences we will add clarifying text and
examples

to the specification.

 

 

Point 2: Clarification, multiple values in emma:lang:

-----------------------------------------------------

 

In call center and other applications multilingual users provide 

inputs in which they switch input language in mid utterance. The

emma:lang in these cases needs to indicate that the language 

involved more than one language, e.g.

 

"quisiera hacer una collect call"

 

The emma:lang in this case would have value "sp en"

 

<emma:emma>

            <emma:interpretation emma:lang="sp en" emma:mode="voice"
emma:medium="acoustic"> 

                        <transcription>quisiera hacer una collect
call</transcription>

            </emma:interpretation>

</emma:emma>

 

In order to use xml:lang in this example perhaps an additional element

could be used, e.g. <span>. Would this work?

 

<emma:emma>

            <emma:interpretation emma:lang="sp en" emma:mode="voice"
emma:medium="acoustic"> 

                        <transcription xml:lang="sp">quisiera hacer una
<span xml:lang="en">collect call</span></transcription>

            </emma:interpretation>

</emma:emma>
Received on Tuesday, 26 June 2007 12:27:13 UTC