ITS-1: REJECT

COMMENT:

i. Allowing ITS markup in EMMA.

With this provision in place, EMMA could for example easily carry for example information on directionality, or ruby. Your example [emma:tokens="arriving at 'Liverpool Street'"] could for example be enhanced by local ITS markup (see http://www.w3.org/TR/its/#basic-concepts-selection-local) as follows in order to explicitly encode directionality information: [its:dir="ltr" emma:tokens="arriving at 'Liverpool Street'"]. Please note, that the EMMA design decision to encode tokens in an attribute prevents a decoration of individual tokens. With an elements-based encoding of tokens, the example [<tokens>arriving at 'Liverpool Street'</tokens>] furthermore could be enhanced by local ITS markup as follows in order to explicitly encode the fact that 'Liverpool Street' is a specific type of linguistic unit ('span' by the way is an element which ITS recommands): [<tokens>arriving at <span its:term="yes">Liverpool Street</span></tokens>"].

Aside: We have considered your response on tokens in http://lists.w3.org/Archives/Public/public-i18n-core/2006JulSep/0074.html while crafting this suggestion. We felt, that ITS-annotations to tokens despite of your response would be valuable.

RESPONSE:

EMMA provides different mechanisms for representing captured input and the various stages of semantic analysis that follow. We agree that there are situations where ITS markup is appropriate within an EMMA document and that the 'emma:tokens' attribute does not permit embedded ITS annotations. The restricted content model of emma:tokens has been intentionally chosen to make common use cases simple. There are other approaches with greater expressive power where ITS annotations may be specified.

EMMA anticipates a rich diversity of user inputs (e.g. keyboard entry, speech, handwriting input) and provides multiple mechanisms for representing that input. The 'emma:tokens' attribute is the most limited of these. Other mechanisms such as the 'emma:signal' and the <emma:derivation> element offer far more freedom. To better explain these different mechanisms, we offer some background and walk through two illustrative examples showing how user input may be used to represented and/or summarized at various levels within the semantic analysis. We expect this review will better explain where 'emma:tokens' is appropriate.

With this foundation, we return to the interesting topic of enhancing EMMA with ITS annotations in the subsequent response.

[General Comments]

EMMA is intended to provide a semantic representation of user input for processing by an application. This differs from many other specification such as XHTML or SSML where CDATA is displayed to the user. EMMA documents not intended for review by the user, although some applications may present portions of the EMMA result to the user. It is in this latter case where I believe your concerns are most appropriate.

Typically, an application does not care about the exact response from the user; instead it cares about the intent. In a clickable map, for instance, the exact coordinates are generally irrelevant except as an indicate which specific item was selected. The same is true for the click coordinates and timing of a user's interaction with graphical widgets. Likewise, a speech application that requests a date may accept "today", "Thursday", "the thirty first", or even "the end of the month" as completely equivalent entries. The corresponding representation for an EMMA processor could appear in various forms: as a simple numeric code

<emma:literal>20070531</emma:literal>

as a structured value

    <date>
        <year>2007</year>
        <month>5</month>
        <day>31</day>
    </date>

as a language-specific form

<emma:literal>31-May-07</emma:literal>

or even as a translated value

<emma:literal xml:lang="de">heute</emma:literal>

in rare cases.

There are certainly cases where having the raw input may be helpful and this need has been anticipated. EMMA was created to provide a consistent representation of semantic results regardless of the input modality. This coverage requires that EMMA be flexible in how input is captured and presented. One obvious choice is to use the 'emma:tokens' attribute which, as you've noted, has numerous limitations. We have found that the simple string value within 'emma:tokens' satisfies several common use cases. A second technique is the use of the 'emma:signal' attribute to reference externally captured raw input. This input could be recorded audio or video as discussed within the EMMA text. It could also reference an external document on which a semantic analysis is being performed. A third option is use of <emma:derivation> chains. These last two are far more flexible and more appropriate for ITS annotation as we shall see below. Which of these is preferred for a given application depends on the specific needs of the EMMA provider and the EMMA consumer.

As an illustration, I will explore two examples. The first focuses on handwriting recognition and the second on text analysis and language translation. I believe both are relevant to this discussion.

[Example 1]

Alice has recently installed some handwriting recognition software on her laptop. One of the sample programs uses the pen input for performing simple command and control. Alice quickly scribbles a command to launch a familiar program. We'll suppose that there are several layers, each of which uses EMMA to represent the value. The first layer records the individual points using the Ink Markup Language [2].

          <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns="http://www.w3.org/2003/InkML">
              <emma:interpretation id="raw">
                  <ink>
                      <trace>
                          10 0, 9 14, 8 28, 7 42, 6 56, 6 70, 8 84, 8 98, 8 112, 9 126, 10 140,
                          13 154, 14 168, 17 182, 18 188, 23 174, 30 160, 38 147, 49 135,
                          58 124, 72 121, 77 135, 80 149, 82 163, 84 177, 87 191, 93 205
                      </trace>
                      <trace>
                          130 155, 144 159, 158 160, 170 154, 179 143, 179 129, 166 125,
                          152 128, 140 136, 131 149, 126 163, 124 177, 128 190, 137 200,
                          150 208, 163 210, 178 208, 192 201, 205 192, 214 180
                      </trace>
                      <trace>
                          227 50, 226 64, 225 78, 227 92, 228 106, 228 120, 229 134,
                          230 148, 234 162, 235 176, 238 190, 241 204
                      </trace>
                      <trace>
                          282 45, 281 59, 284 73, 285 87, 287 101, 288 115, 290 129,
                          291 143, 294 157, 294 171, 294 185, 296 199, 300 213
                      </trace>
                      <trace>
                          366 130, 359 143, 354 157, 349 171, 352 185, 359 197,
                          371 204, 385 205, 398 202, 408 191, 413 177, 413 163,
                          405 150, 392 143, 378 141, 365 150
                      </trace>
                  </ink>
              </emma:interpretation>
          </emma:emma>
Here the input could not be presented using 'emma:tokens'. The 'emma:signal' could be used, but a derivation chain might be preferred as the XML content can be included directly within the EMMA document. This form may be meaningful for a graphics program, but a text-based program needs to convert these points into characters. The next layer evaluates the trace and derives the corresponding text stream.
          <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns="http://www.w3.org/2003/InkML">  
              <emma:derivation>
                  <emma:interpretation id="points">
                      <ink>
                          <trace> 10 0, 9 14, 8 28, 7 42, 6 56, 6 70, 8 84, 8 98, 8 112, 9 126, 10 140, 13
                              154, 14 168, 17 182, 18 188, 23 174, 30 160, 38 147, 49 135, 58 124, 72 121,
                              77 135, 80 149, 82 163, 84 177, 87 191, 93 205 </trace>
                          <trace> 130 155, 144 159, 158 160, 170 154, 179 143, 179 129, 166 125, 152 128,
                              140 136, 131 149, 126 163, 124 177, 128 190, 137 200, 150 208, 163 210, 178
                              208, 192 201, 205 192, 214 180 </trace>
                          <trace> 227 50, 226 64, 225 78, 227 92, 228 106, 228 120, 229 134, 230 148, 234
                              162, 235 176, 238 190, 241 204 </trace>
                          <trace> 282 45, 281 59, 284 73, 285 87, 287 101, 288 115, 290 129, 291 143, 294
                              157, 294 171, 294 185, 296 199, 300 213 </trace>
                          <trace> 366 130, 359 143, 354 157, 349 171, 352 185, 359 197, 371 204, 385 205,
                              398 202, 408 191, 413 177, 413 163, 405 150, 392 143, 378 141, 365 150
                          </trace>
                      </ink>
                  </emma:interpretation>
              </emma:derivation>

              <emma:interpretation id="characters">
                  <emma:derived-from resource="#points" composite="false"/>
                  <emma:literal>hello</emma:literal>
              </emma:interpretation>
          </emma:emma>

Here we can see the derivation chain containing both the raw coordinate data and the translation into characters. As a final step, the program needs to convert the input into action.

          <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns="http://www.example.com/example">
              <emma:interpretation emma:tokens="hello">
                  <emma:derived-from resource="debug/session_0001282.xml" composite="false"/>
                  <launch>hello.exe</launch>
              </emma:interpretation>
          </emma:emma>
Here the prior derivation has been referenced and 'emma:tokens' has been used to store the character values generated in the second step. Of course, the system could have provided the full derivation chain instead, but the extra information might only be relevant for system debugging and inappropriate for typical use. Here the value of 'emma:tokens' is provided as debugging information. The application only cares about 26 characters: "<launch>hello.exe</launch>".

[Example 2]

Bob has purchased a Universal Translator. He sets the output to Spanish and then speaks: "I love to eat Mexican food because it is spicy". The device performs an analysis of his speech and then generates a translation which will be played using text-to-speech. The EMMA result looks like this:

          <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma"
              xmlns="http://example.com/universal_translator">
              <emma:derivation>
                  <emma:interpretation id="english"
                      emma:tokens="I love to eat Mexican food because it is spicy">
                      <assertion>
                          <interaction wordnet="1828736" wordnet-desc="love, enjoy (get pleasure from)"
                              token="love">
                              <experiencer reference="first" token="I">
                                  <attribute quantity="single"/>
                              </experiencer>
      
                              <attribute time="present"/>
      
                              <content>
                                  <interaction wordnet="1157345" wordnet-desc="eat (take in solid food)"
                                      token="to eat">
                                      <object id="obj1" wordnet="7555863"
                                          wordnet-desc="food, solid food (any solid substance (as opposed to liquid) that is used as a source of nourishment)"
                                          token="food">
                                          <restriction wordnet="3026902"
                                              wordnet-desc="Mexican (of or relating to Mexico or its inhabitants)"
                                              token="Mexican"/>
                                      </object>
                                  </interaction>
                              </content>
      
                              <reason token="because">
                                  <experiencer reference="third" target="obj1" token="it"/>
      
                                  <attribute time="present"/>

                                  <one-of token="spicy">
                                      <modification wordnet="2397732"
                                          wordnet-desc="hot, spicy (producing a burning sensation on the taste nerves)"
                                          confidence="0.8"/>
                                      <modification wordnet="2398378"
                                          wordnet-desc="piquant, savory, savoury, spicy, zesty (having an agreeably pungent taste)"
                                          confidence="0.4"/>
                                  </one-of>
                              </reason>
                          </interaction>
                      </assertion>
                  </emma:interpretation>
              </emma:derivation>
            
              <emma:interpretation id="spanish">
                  <emma:derived-from resource="#english" composite="false"/>
                  <result xml:lang="es">Adoro alimento mejicano porque es picante.</result>
              </emma:interpretation>
          </emma:emma>
Again the value in 'emma:tokens' is informative only. The real meaning is expressed within the application namespace content appearing inside <emma:interpretation>. Extensive use of WordNet identifiers is made to describe the different bits of semantic data contained within the sentence. Each of these identifiers may then be converted into Spanish [4], conjugated appropriately, and placed within the appropriate structural frame. The resulting translation is not perfect but beats the output from a familiar MT engine [5].

ITS-2: NO RESPONSE NECESSARY

COMMENT:

ii. Creating an ITS Rule file (see http://www.w3.org/TR/its/#link-external-rules) along with the EMMA specification (e.g. as a non-normative appendix).

With this in place, localization/translation would become easier in case EMMA instances or parts of EMMA instances (eg. an "interpretation") would need to be transfered from one natural language to another one.

Several EMMA and elements and attributes contain text. Most, if not all localization tools (as well as ITS) assume element content is translatable and attribute content is not translatable. However in EMMA, this assumption does not seem to be valid. The EMMA element "interpretation" for example does not seem to contain immediate translatable content, and the EMMA attribute "tokens" in some circumstances might have to be translated.

While this is fine because tools have ways to specify an element should not be translated, it is very often quite difficult no know *which elements* or *which attributes* should behave like that. Having a list of elements that are non-translatable (or conversely if there are more non-translatable than translatable elements) would help a lot. This list could be expressed using ITS rules (see http://www.w3.org/TR/its/#basic-concepts-selection-global) relating to "its:translate" (see "its:translate" see http://www.w3.org/TR/its/#trans-datacat). This way all user of translation tools (or other language-related applications such as machine-translation engines, etc.) could look up that set of rules and process accordingly.

For the examples given above, and ITS rules file could be as simple as:

      <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0">
       <its:translateRule selector="//interpretation" translate="no"/>
       <its:translateRule selector="//@tokens" translate="yes"/>
      </its:rules>

RESPONSE:

We would like to thank the ITS team raising this topic and suggesting how ITS might be used within EMMA. We appreciate the benefits that ITS brings and would like to share some further thoughts on employing ITS within EMMA as guidance for future implementatations.

There are many situations where ITS is not appropriate within EMMA. There may be no natural language at all as in the first example. The string 'hello' is a series of characters being generated chronologically rather than a word from a natural language. This is true for DTMF and often true for gestures. Likewise, natural language may be used only as an interim step before a application-meaningful result is generated. The case of 'today' mapping to '20070531' is an example. Here again, the focus is on the semantic meaning and not the human input.

There are other cases where EMMA interpretations contain natural language content that could be presented to a user. Here some agent must accept the EMMA document and transform it prior to presentation (even if the transformation is simply pulling out content with XPath). Should it be the role of the document or the transforming agent to decide how this transformation occurs? Generally, we would place the burden on the agent for one simple reason: the agent typically applies a consistent set of rules depending on the schema of the content. In other annotations such as for directionality indication, we endorse the use of ITS within the document. Here all three mechanisms described in the ITS specification should be available to EMMA producers: as inline annotations

          <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma"
              xmlns:its="http://www.w3.org/2005/11/its"
              xmlns="http://example.com/plan_of_action">
              <emma:interpretation id="english" emma:tokens="My name is Inigo Montoya, you killed my father, prepare to die!">
                  <name its:translate="no">Inigo Montoya</name>
                  <objection>murder of father</objection>
                  <intent>revenge</intent>
              </emma:interpretation>
          </emma:emma>

as a rule reference within an EMMA element such as <emma:interpretation>

          <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:its="http://www.w3.org/2005/11/its"
              xmlns="http://example.com/plan_of_action">
              <emma:model
              <emma:interpretation id="english" emma:tokens="My name is Inigo Montoya, you killed my father, prepare to die!">
                  <its:rules version="1.0" xlink:href="http://example.com/plan_of_action/its.xml" xlink:type="simple"/>
                  <name>Inigo Montoya</name>
                  <objection>murder of father</objection>
                  <intent>revenge</intent>
              </emma:interpretation>
          </emma:emma>

or via processing instructions.

[1] http://lists.w3.org/Archives/Public/www-multimodal/2007May/0010.html
[2] http://www.w3.org/TR/InkML/
[3] http://wordnet.princeton.edu/
[4] http://multiwordnet.itc.it/english/home.php
[5] http://babelfish.altavista.com/
[6] http://lists.w3.org/Archives/Public/public-i18n-its/2006AprJun/0183.html