Re: [EMMA] i18n comment: Use of xml:lang=""

I18N-4:  ACCEPT with modification



Comment from the i18n review of:


Comment 4


Editorial/substantive: S

Owner: RI


Location in reviewed document:

4.2.5 []



In XML 1.0 you can indicate the lack of 

language information using xml:lang="". How does EMMA allow for that
with xml:lang 

and emma:lang? We feel it ought to. See





ACCEPT (with modification)


Thank you for raising this important issue. In addressing this issue and

reading related documents such as

we determined that in addition to the use of emma:lang="" we should also
address the

use of emma:lang="zxx". Below we address each in turn:


1. Non-linguistic input (emma:lang="zxx"):


Given the use of EMMA for capturing multimodal input, including input

using pen/ink, sensors, computer vision etc there are many EMMA results

that capture non-linguistic input. Example include drawing areas, arrows

on maps and music input for tune recognition. This raises the question

how non-linguistic inputs should be annotated for emma:lang. Following
on from

the use in xml:lang, we propose that non-linguistic input should be

using the value "zxx". Since we already refer to BCP 47 and use the
values from the 

IANA subtag registry for emma:lang values this does not require revision
of the 

EMMA markup. We will however, add an example and clarifying text to the

specification indicating the use of emma:lang="zxx" for non-linguistic


To illustrate the difference between emma:lang and xml:lang for this
kind of

case. Hummed input to a tune recognition application would be

since the input is not in a human language, but it the result was a 

song title in English, that would be marked as xml:lang="en":



            <emma:interpretation emma:lang="zxx" emma:mode="tune"

                        <songtitle xml:lang="en">another one bites the





2. Non-specification (emma:lang="")


Parallel to your suggested usage for xml:lang


for cases in which there is no information about

whether the source input is in a particular human language and if so

language, are annotated as emma:lang="". 


Furthermore, in cases where there is not explicit emma:lang

annotation, and none is inherited from a higher element in the

document, the default value for emma:lang is "" meaning

that there is no information about whether the source 

input is in a language and if so which language.

Received on Tuesday, 26 June 2007 12:37:43 UTC