Re: emma:node anchoring on signal time axis from Paolo Martini on 2007-08-06 (www-multimodal@w3.org from August 2007)

From: Paolo Martini <paolo.martini.relex@chello.be>
Date: Mon, 6 Aug 2007 21:50:50 +0200
To: <www-multimodal@w3.org> <www-multimodal@w3.org> <www-multimodal@w3.org>, "JOHNSTON, MICHAEL J (MICHAEL J)" <johnston@research.att.com>
Message-Id: <79470eb885659e78d4b2bbe521ed5c44@chello.be>
Dear Michael and all,
I appreciate the intentionally well delimited scope of EMMA 1.0 and the 
adequacy of the current draft to it.

In order to better clarify the description of that scope, I would 
suggest rewording and elaborating the Introduction statement:
"The language is focused on annotating the interpretation information 
of single and composed inputs, as opposed to (possibly identical) 
information that might have been collected over the course of a dialog.
The language provides a set of elements and attributes that are focused 
on accurately representing annotations on the input interpretations."

1. Is "composed inputs" the same of "composite input" defined in the 
Terminology?

2. Following the Terminology at 1.2, "interpretation" appears to be 
signified and "user input" the signifier. "annotation" seems to be used 
a bit ambiguously: it is my understanding that the common significance 
of "annotation" is the association - act or product of the act - of 
content, at any 'meta-' level, to a region of a signal.
"accurately representing annotations on the input interpretations" 
seems therefore to refer to content associated to the signified (as 
indeed do attributes like emma:cost, emma:process, etc.). Nevertheless, 
EMMA seems to provide also a way to represent the interpretation itself 
- mainly with literals - besides being open to include any application 
specific representation, but it is not clear if it is an accessory 
function or a foundation one.
Together with "Interpretations of user input are said to be derived 
from that input, and higher levels interpretations may be derived from 
lower level ones. EMMA allows you to reference the user input or 
interpretation a given interpretation was derived from", the issue 
becomes even more evident when comparing the expressive power of 
emma:interpretation and emma:arc to represent alternative 
interpretations (as advertised in "Lattices provide a compact 
representation of large lists of possible recognition results or 
interpretations"):
- an emma:interpretation has an "id" while an emma:arc needs "from", 
"to" and the signified to be identified (i.e. difficult reification of 
emma:arcs)
- an emma:interpretation has emma:tokens to refer to user input, while 
an emma:arc can only refer to a time region of the input signal

I would suggest adding examples lattice representation of two-level 
analysis, like
a vocal /bo/+?+/ton/ to orthographic "Boston" and alternative "Bolton"
and then a "Boston" and "Bolton" to the semantic interpretations "BOS" 
and "TZR"
How does the second lattice refer to "Boston" and "Bolton" of the first 
lattice? Maybe, the situation could be addressed with 
emma:interpretation, but then the whole idea of the lattice as a 
compact representation fails. And anyway, while emma:tokens could refer 
to the orthographic "Boston" and "Bolton", I cannot find an IDREF 
attribute to refer to the "id" of an "emma:interpretation".

Furthermore, an emma:interpretation can specify emma:process while 
emma:arc cannot.

3. About the attributes: "id","from","to", etc. are indicated without 
"emma:", but, lacking other namespace indication, will end up 
inheriting the NS of the element to which they belong, that is "emma:". 
A more consistent indication could help reading the specification.

4. By the way, could "ref" be replaced by a common xlink:href ?



I am sorry to be so picky - though I hope at least relevant  - on what 
could be minor details of this version, but I am worried that, at the 
moment of future extensions of the scope, they could force major 
changes or awkward solutions to maintain backward compatibility.

I would therefore suggest the following additions:

- Allow optional "id" attribute to emma:arc (and evaluate using it also 
to replace "node-number").

- Allow optional emma:tokens to emma:arc.

- Allow optional emma:process to emma:arc.

- Add an attribute of type IDREF to reference emma:id and allow it 
wherever useful.

- Add to emma:arc a type- or class-like attribute, with default to an 
empty string or null value allowing to consider of the same "type" all 
the arcs without the attribute. This could be a solution to a compact 
representation of the two-level analysis previously described (ex. 
type="orthographic" and type="semantic") but it would allow also 
experimenting on the already discussed integration with other 
annotation types without asking modifications to consumers following 
the current draft (assuming they ignore, as they should, attributes 
they don't understand).

Best regards,

    Paolo Martini



Le 03-août-07, à 19:29, JOHNSTON, MICHAEL J (MICHAEL J) a écrit :

>
> Dear Paolo Martini,
>  
> Thank you for your detailed and thoughtful contributions on 
> emma:lattice.
> The EMMA subgroup have discussed your comments in detail and formulated
> the following responses.
>  
> Regarding your first point about the relative time mechanism on 
> emma:node.
> We agree that like the absolute timestamps, the relative time stamp 
> mechanism
> was also not intended to apply to emma:node and will remove relative
> timestamps from emma:node in the specification.
>  
> Regarding the three different time axes you describe (input, model, 
> output),
> the scope of the EMMA specification addresses only the input axis at 
> this
> point in its development. In the longer term we hope to extend EMMA 
> for representation
> of system output as well as user inputs but for EMMA 1.0 we address 
> only input.
> Your comments regarding the output time axis are particularly relevant 
> for output
> representation in EMMA and will provide valuable input for future 
> versions of EMMA.
>  
> Regarding the connections between emma:lattice representations and 
> annotation
> graphs such as ATLAS, again this is a very good feedback. The initial 
> intention behind
> emma:lattice is to capture and provide a standard representation for 
> the graph outputs
> that vendors of speech recognition and other modality processing 
> components
> currently provide in proprietary representations. Over the course of 
> this work more use cases
> have come up and there is growing interest in the potential use of 
> EMMA more broadly
> for annotation of speech corpora and other resources.  The initial 
> scope of EMMA is to provide
> a mechanism for communication among the components of interactive 
> systems, such
> as spoken and multimodal dialog systems.  In future versions of EMMA 
> beyond 1.0 we
> hope to provide more support for annotation and corpus use cases, and
> your input on relations with annotation schemes such as ATLAS will be 
> extremely
> valuable for that work.
>  
> We would greatly appreciate it if you could respond within the next 
> two weeks
> indicating whether this response addresses your concerns. Thanks again 
> for
> such detailed feedback.
>  
> best
> Michael Johnston
>  
>  
> On behalf of W3C Multimodal working group
>  
>
Attachments

text/enriched attachment: stored
Received on Monday, 6 August 2007 19:50:55 UTC