W3C home > Mailing lists > Public > www-multimodal@w3.org > August 2007

Re: emma:node anchoring on signal time axis

From: JOHNSTON, MICHAEL J (MICHAEL J) <johnston@research.att.com>
Date: Mon, 13 Aug 2007 16:33:38 -0400
Message-ID: <0C50B346CAD5214EA8B5A7C0914CF2A45CFCAA@njfpsrvexg3.research.att.com>
To: <www-multimodal@w3.org>, "Paolo Martini" <paolo.martini.relex@chello.be>
Dear Paolo Martini, 

 

Thanks again for your detailed comments on the EMMA specification.

Your suggestions are particularly thoughtful and valuable and so to the

extent possible we will incorporate these suggestions into the current

draft of the specification, even though they fall outside the official
last call period.

 

Regarding your comments on the language of the scope statement in

Section 1:

 

>>  1. Is "composed inputs" the same of "composite input" defined in the
Terminology? 

>>  The language is focussed on annotating single inputs from users,
which may be 

>>  either from a single mode or a composite input combining 

>>  information from multiple modes, as opposed to information that
might have been 

>>  collected over multiple turns of a dialog. The language provides a
set of elements 

>>  and attributes that are focused on enabling annotations 

>>  on user inputs and interpretations of those inputs. 

>> 

>>  2. Following the Terminology at 1.2, "interpretation" appears to be
signified 

>>  and "user input" the signifier. "annotation" seems to be used a bit
ambiguously: 

>>  it is my understanding that the common significance of "annotation"
is the association

>>  - act or product of the act - of content, at any 'meta-' level, to a
region of a signal. 

>>  "accurately representing annotations on the input interpretations"
seems therefore to 

>>  refer to content associated to the signified (as indeed do
attributes like emma:cost,

>>  emma:process, etc.). Nevertheless, EMMA seems to provide also a way
to represent the 

>>  interpretation itself - mainly with literals - besides being open to
include any 

>>  application specific representation, but it is not clear if it is an
accessory 

>>  function or a foundation one. Together with "Interpretations of user
input are said to

>>  be derived from that input, and higher levels interpretations may be
derived from lower

>>  level ones. EMMA allows you to reference the user input or
interpretation a given 

>>  interpretation was derived from", the issue becomes even more
evident when comparing

>>  the expressive power of emma:interpretation and emma:arc to
represent alternative 

>>  interpretations (as advertised in "Lattices provide a compact
representation of large

>>  lists of possible recognition results or interpretations"): 

>>  - an emma:interpretation has an "id" while an emma:arc needs "from",
"to" and the 

>>  signified to be identified (i.e. difficult reification of emma:arcs)


>>  - an emma:interpretation has emma:tokens to refer to user input,
while an emma:arc 

>>  can only refer to a time region of the input signal 

 

Thank you for your feedback, we agree that the use of EMMA to annotate
aspects both of 

the original signal and the interpretation can be made more clear, as
well as 

what we mean by 'composed inputs'. We have incorporated your suggestions
and revised the

introductory informative paragraph in Section 1 as follows:

 

"The language is focused on annotating single inputs from users, 

which may be either from a single mode or a composite input combining
information

from multiple modes, as opposed to information that might have been
collected over 

multiple turns of a dialog. The language provides a set of elements and
attributes 

that are focused on enabling annotations on both user inputs 

and interpretations of those inputs." 

 

Regarding the addition of further lattice examples:

 

>>  I would suggest adding examples lattice representation of two-level
analysis, like 

>>  a vocal /bo/+?+/ton/ to orthographic "Boston" and alternative
"Bolton" 

>>  and then a "Boston" and "Bolton" to the semantic interpretations
"BOS" and "TZR" 

>>  How does the second lattice refer to "Boston" and "Bolton" of the
first lattice? 

>>  Maybe, the situation could be addressed with emma:interpretation,
but then the

>>  whole idea of the lattice as a compact representation fails. And
anyway, while 

>>  emma:tokens could refer to the orthographic "Boston" and "Bolton", I
cannot 

>>  find an IDREF attribute to refer to the "id" of an
"emma:interpretation". 

 

Over the years that the specification has been developed we have not
received

any requests or use cases from vendors or developers for the ability to
cross 

reference from portions of one lattice to another. We agree that 

this capability could potentially be useful, especially for more complex
data annotation use cases,

however for the following three reasons we choose not to incorporate
this 

suggestion into the current draft: 1. More detailed use cases from
multiple 

vendors are required in order motivate the addition of this
functionality.

2. A longer and more detailed and comprehensive study of how the EMMA
lattice 

representation can be extended to enable more complex kinds of
annotation use 

cases is required before incorporation into the standard. This work goes
beyond the

scope of the current EMMA effort and should be postponed for
consideration for 

future versions of EMMA. 3. Since emma:arc can contain application
namespace

data and emma:info is available for application and vendor specific
annotations,

nothing in the current specification actively prevents this kinds of
annotation. 

 

Regarding the specification of the emma: namespace prefix on attributes
which are

specific to EMMA elements, such as emma:arc, emma:derived-from:

 

>>  3. About the attributes: "id","from","to", etc. are indicated
without "emma:", but, 

>>  lacking other namespace indication, will end up inheriting the NS of
the element to 

>>  which they belong, that is "emma:". A more consistent indication
could help reading the specification. 

 

It is well supported practice within XML and W3C specifications 

to not have to repeat over and over the namespace for attributes
specific 

to an element within a namespace. In the interests of reducing the
verbosity

and increasing the readability of the examples in the EMMA specification

we are consistently not using the EMMA namespace for attributes. 

 

Regarding xlink:href:

 

> 4. By the way, could "ref" be replaced by a common xlink:href ? 

 

A number of different EMMA elements use an "ref" attribute to reference
a 

URI (emma:grammar, emma:model). In order to capture the specific
semantics

and constraints on EMMA markup we use an element specific "ref"
attribute 

rather than xlink:href. For example, emma:model requires either "ref" or
an

inline specification of the model. 

 

Regarding the suggested extensions to the emma:arc element:

 

>>  I am sorry to be so picky - though I hope at least relevant - on
what could be minor details of

>>  this version, but I am worried that, at the moment of future
extensions of the scope, they could 

>>  force major changes or awkward solutions to maintain backward
compatibility. 

>> 

>>  I would therefore suggest the following additions: 

>> 

>>  - Allow optional "id" attribute to emma:arc (and evaluate using it
also to replace "node-number"). 

>> 

>>  - Allow optional emma:tokens to emma:arc. 

>>  

>>  - Allow optional emma:process to emma:arc. 

>>  - Add an attribute of type IDREF to reference emma:id and allow it
wherever useful. 

>> 

>>  Add to emma:arc a type- or class-like attribute, with default to an
empty string 

>>  or null value allowing to consider of the same "type" all the arcs
without 

>>  the attribute. This could be a solution to a compact representation
of the two-level

>>  analysis previously described (ex. type="orthographic" and
type="semantic") but it would 

>>  allow also experimenting on the already discussed integration with
other annotation types 

>>  without asking modifications to consumers following the current
draft (assuming they ignore,

>>  as they should, attributes they don't understand). 

 

Thanks for your detailed feedback. These are excellent suggestions for a
richer and more powerful

lattice annotation mechanism. However, given the lack of detailed use
cases, requests from 

vendors for this capability, and the need for a far longer and more
detailed investigation

of how to use lattice representations for more complex annotations tasks
we defer consideration

of these additions to a future version of the specification. It is
important to note that

much of what is described here can currently be captured using
annotations within 

emma:info (which is permitted within emma:arc) and the ability to place
annotations within

the application namespace data for. For example, in case of a word
lattice from speech

recognition there is no need for emma:tokens since the label on the arc
will be

the token. In the case of meaning lattice, EMMA attributes and
annotations in other 

namespaces can appear directly on the application namespace data. If
specific vendors wish

to use a type classification to identify particular sets of arcs this
can be achieved

using annotations in emma:info, on application namespace markup within
emma:arc, or using

an attribute in another namespace on emma:arc itself. 

 

Thanks again for all of your comments. We think that a number of aspects
of the

specification have been clarified and improved as a result of your
feedback.

 

We would greatly appreciate it if you could respond within the next 

week indicating if this response addresses your concerns so that we

can move forward with the specification. 

 

best

Michael Johnston

 

On behalf of W3C Multimodal working group

 
Received on Monday, 13 August 2007 20:34:52 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:06:34 UTC