- From: <johnston@research.att.com>
- Date: Wed, 4 Jul 2007 10:41:42 -0400
- To: <paolo.martini.relex@chello.be>, <www-multimodal@w3.org>
- Message-ID: <0C50B346CAD5214EA8B5A7C0914CF2A4485C36@njfpsrvexg3.research.att.com>
Dear Paolo Martini, Thanks Paolo for your thoughtful comments on the EMMA specification. The W3C Multimodal Working group has discussed these comments in some detail and formulated the following response. You are correct that emma:node elements are intended to correspond to instants. Regarding 1., we agree that as it stands the ability to place both emma:start and emma:end on emma:node appears to allow a duration. This is an error in the current draft as we did not intend for emma:start and emma:end to be used on emma:node. In the next draft of the EMMA specification and the corresponding schema we will remove the emma:start and emma:end attributes from emma:node. With respect to point 2. The primary motivation for the addition of emma:node was to provide a place for annotations which apply specifically to nodes rather than to arcs. For example, in some representations of speech recognition lattices, confidences or weights are placed on nodes in addition to arcs. For this reason we define both nodes and arcs. It is critical that we have both timestamps and node start end annotations on arcs as they serve different purposes. The role of the 'from' and 'to' annotations on arcs is to define the topology of the graph. On the other hand the timestamps emma:start and emma:end are annotations which describe temporal properties associated not necessarily with the arc but with the label on the arc. There is in fact no guarantee that the emma:end on 'flights' in your example will be equivalent to the emma:start on 'to'. If they were required to be the same, the transition point from one arc to the next would have to be assigned to an arbitrary point in the silence between the two words. Similarly if there is no silence between two words in sequence and in fact they may share a geminate consonant, for example"well lit" "gas station" word timings from the recognizer may in fact overlap, that is the end of the arc for the word "well" may be later than beginning of "lit". Perhaps the even stronger case for having both time and the 'from' 'to' annotations is that in the lattice representation being at a particular time point does not guarantee that you are on the same node in the lattice. For example, imagine a lattice representing two possible strings: 'to boston' 'two blouses' The lattice representation: <emma:lattice initial="1" final="4"> <emma:arc from="1" to="2" start="1000" end ="2000">to</emma:arc> <emma:arc from="1" to="3" start="1000" end ="2000">two</emma:arc> <emma:arc from="2" to="4" start="2000" end ="4000">boston</emma:arc> <emma:arc from="3" to="4" start="2000" end ="4000">blouses</emma:arc> </emma:lattice> Note that even though the first two arcs end at the same time point those arcs lead to different states 2 vs. 3, encoding which path has been taken in the graph. The critical factor here is that the lattice representation does not necessarily have to correspond to a time sequence. The lattice representation is used to encode a range of possible interpretations of a signal. It is often the case that the left to right sequence of symbols in the lattice corresponds to time but there is no guarantee. For example, the lattice may represent interpretations of a typed text string rather than speech. It is also possible that a semantic representation encoded as a lattice could have time annotations on the first arc which are later than time annotations on the final arc. Since lattices represent abstractions over the signal we cannot assume that time annotations define their topology. In order to clarify this we will add text to the specification making clear that lattices represent abstractions of the signal, and that time annotations may describe labels rather than arcs. We would greatly appreciate if you would review this response and respond within three weeks indicating whether this resolves your concern. If we do not receive a response within three weeks we will assume that this response resolves your concern. best Michael Johnston W3C Multimodal Working Group Dear W3C Multimodal working group, I approached only recently EMMA and I have some problems understanding the temporal anchoring of an emma:node. I would instinctively expect a node to correspond to what ISO 8601 calls an "instant", a "point on the time axis". With reference to paragraph 3.4, if I read correctly the document: 1. An emma:node can be anchored with absolute or relative timestamps. In the absolute mode, the optional emma:start and emma:end attributes seem to allow a duration, while in the relative mode, the optional emma:offset-to-start (with emma:duration not allowed) seems to force an instant status. If, conceptually, a node is allowed to correspond to a segment of the signal, I would welcome a comment on the rationale for that. If not, I would suggest to replace emma:start and emma:end with a single "time point"-like attribute or, at least, to forbid emma:end, implicitely adding ambiguity in the semantics of emma:start. 2. An emma:arc implicitly asserts the existence of two nodes, but I would say that the temporal attributes of the arcs, if present, define those nodes. A node could be therefore defined more than once. I simplify the example in 3.4.2: <emma:arc from="1" to="2" emma:start="1087995961542" emma:end="1087995962042">flights</emma:arc> <emma:arc from="2" to="3" emma:start="1087995962042" emma:end="1087995962542">to</emma:arc> Being node 2 the same, what if emma:end in the "flights" arc and emma:start in the "to" arc do not have the very same value? Again, if this is conceptually allowed, I would welcome an explanation of the rationale. Otherwise, I would prefer enforcing a coherent description directly in the language instead of relying on validity checks. For example, restricting the "definition" of nodes inside node:element, i.e. forbidding timestamps in arcs. I went through the document and the list archive and I wasnít able to find answers to these doubts. Nevertheless, I apologize if these points have already been addressed. Thanks for your help and your work, Paolo Martini
Received on Wednesday, 4 July 2007 14:44:12 UTC