Re: emma:node anchoring on signal time axis from johnston@research.att.com on 2007-07-04 (www-multimodal@w3.org from July 2007)

From: <johnston@research.att.com>
Date: Wed, 4 Jul 2007 10:41:42 -0400
To: <paolo.martini.relex@chello.be>, <www-multimodal@w3.org>
Message-ID: <0C50B346CAD5214EA8B5A7C0914CF2A4485C36@njfpsrvexg3.research.att.com>
Dear Paolo Martini,

 

Thanks Paolo for your thoughtful comments on the EMMA specification. The

W3C Multimodal Working group has discussed these comments in some detail 

and formulated the following response.

 

You are correct that emma:node elements are intended to correspond to instants. 

Regarding 1., we agree that as it stands the ability to place both

emma:start and emma:end on emma:node appears to allow a duration.  This 

is an error in the current draft as we did not intend for emma:start and emma:end

to be used on emma:node. In the next draft of the EMMA specification and

the corresponding schema we will remove the emma:start and emma:end 

attributes from emma:node. 

 

With respect to point 2. The primary motivation for the addition of 

emma:node was to provide a place for annotations

which apply specifically to nodes rather than to 

arcs. For example, in some representations of speech recognition lattices,

confidences or weights are placed on nodes in addition to arcs. 

For this reason we define both nodes and arcs.

It is critical that we have both timestamps and node start end 

annotations on arcs as they serve different purposes. The role of 

the 'from' and 'to' annotations on arcs is to define the topology of the 

graph. On the other hand the timestamps emma:start and emma:end are annotations

which describe temporal properties associated not necessarily with the arc but

with the label on the arc. There is in fact no guarantee that the emma:end on

'flights' in your example will be equivalent to the emma:start on 'to'. 

If they were required to be the same, the transition point from one arc to the next

would have to be assigned to an arbitrary point in the silence between the two 

words. Similarly if there is no silence between two words in sequence and

in fact they may share a geminate consonant, for example"well lit" "gas station"

word timings from the recognizer may in fact overlap, that is the end of

the arc for the word "well" may be later than beginning of "lit".

 

Perhaps the even stronger case for having both time and the 'from' 'to'

annotations is that in the lattice representation being at a particular 

time point does not guarantee that you are on the same node in the lattice. 

For example, imagine a lattice representing two possible strings:

 

'to boston'

'two blouses' 

 

The lattice representation:

 

<emma:lattice initial="1" final="4">

<emma:arc from="1" to="2" start="1000" end ="2000">to</emma:arc>

<emma:arc from="1" to="3" start="1000" end ="2000">two</emma:arc>

<emma:arc from="2" to="4" start="2000" end ="4000">boston</emma:arc>

<emma:arc from="3" to="4" start="2000" end ="4000">blouses</emma:arc>

</emma:lattice>

 

Note that even though the first two arcs end at the same time point

those arcs lead to different states 2 vs. 3, encoding which path

has been taken in the graph.

 

The critical factor here is that the lattice representation does not 

necessarily have to correspond to a time sequence. The lattice representation

is used to encode a range of possible interpretations of a signal. It is 

often the case that the left to right sequence of symbols in the lattice corresponds to

time but there is no guarantee. For example, the lattice may represent 

interpretations of a typed text string rather than speech. It is also possible that

a semantic representation encoded as a lattice could have time annotations

on the first arc which are later than time annotations on the final arc. 

Since lattices represent abstractions over the signal we cannot assume

that time annotations define their topology. 

 

In order to clarify this we will add text to the 

specification making clear that lattices represent abstractions of the

signal, and that time annotations may describe labels rather than arcs.

 

We would greatly appreciate if you would review this response and

respond within three weeks indicating whether this resolves 

your concern. If we do not receive a response within three weeks we 

will assume that this response resolves your concern.

 

 

best

Michael Johnston

W3C Multimodal Working Group

 

 

 

 

Dear W3C Multimodal working group,
 
I approached only recently EMMA and I have some problems understanding 
the temporal anchoring of an emma:node.
 
I would instinctively expect a node to correspond to what ISO 8601 
calls an "instant", a "point on the time axis".
 
With reference to paragraph 3.4, if I read correctly the document:
 
1. An emma:node can be anchored with absolute or relative timestamps. 
In the absolute mode, the optional emma:start and emma:end attributes 
seem to allow a duration, while in the relative mode, the optional 
emma:offset-to-start (with emma:duration not allowed) seems to force an 
instant status.
If, conceptually, a node is allowed to correspond to a segment of the 
signal, I would welcome a comment on the rationale for that. If  not, I 
would suggest to replace emma:start and emma:end with a single "time 
point"-like attribute or, at least, to forbid emma:end, implicitely 
adding ambiguity in the semantics of  emma:start.
 
2. An emma:arc implicitly asserts the existence of two nodes, but I 
would say that the temporal attributes of the arcs, if present, define 
those nodes. A node could be therefore defined more than once. I 
simplify the example in 3.4.2:
<emma:arc from="1" to="2"
emma:start="1087995961542" emma:end="1087995962042">flights</emma:arc>
<emma:arc from="2" to="3"
emma:start="1087995962042" emma:end="1087995962542">to</emma:arc>
Being node 2 the same, what if emma:end in the "flights" arc and 
emma:start in the "to" arc do not have the very same value?
Again, if this is conceptually allowed, I would welcome an explanation 
of the rationale. Otherwise, I would prefer enforcing a coherent 
description directly in the language instead of relying on validity 
checks. For example, restricting the "definition" of nodes inside 
node:element, i.e. forbidding timestamps in arcs.
 
I went through the document and the list archive and I wasnít able to 
find answers to these doubts. Nevertheless, I apologize if these points 
have already been addressed.
Thanks for your help and your work,
 
   Paolo Martini
Received on Wednesday, 4 July 2007 14:44:12 UTC