- From: Deborah Dahl <dahl@conversational-technologies.com>
- Date: Thu, 14 Jun 2012 09:13:04 -0400
- To: "'Jerry Carter'" <jerry@jerrycarter.org>, <public-speech-api@w3.org>
- Message-ID: <00ba01cd4a2f$72a162f0$57e428d0$@conversational-technologies.com>
Thanks for bringing up this very common architecture that combines local signal processing with cloud-based processing. We should definitely keep this architecture in mind as we discuss use cases. To make sure I understand your suggestion, I think one instantiation of the process you're suggesting might be 1. Speech is captured locally and transmitted to the cloud-based recognizer. 2. The UA builds part of the EMMA with locally-known information like the timestamps, a reference to the emma:process, emma:source, emma:grammar, and possibly a reference to the emma:signal, if it knows that. 3. The speech recognizer comes back with its own EMMA representing the recognition results. 4. The local EMMA and the speech recognizer's EMMA are combined by the UA as a derivation to create the EMMA that's made available as part of the speech result through the API. Is this roughly what you had in mind? I agree with you that it would actually be very convenient if the UA did this processing, but I don't think anything's preventing the application from doing it if there are other reasons for the UA not to modify the speech recognition result. Milan and Salish, could you elaborate on what you had in mind when you raised concerns about the UA modifying the speech recognizer's EMMA? From: Jerry Carter [mailto:jerry@jerrycarter.org] Sent: Wednesday, June 13, 2012 10:31 PM To: public-speech-api@w3.org; Deborah Dahl Subject: Review of EMMA usage in the Speech API (first editor's draft) The current language is fairly minimal: emma <http://www.w3.org/TR/emma/> EMMA 1.0 representation of this result. The contents of this result could vary across UAs and recognition engines, but all implementations must expose a valid XML document complete with EMMA namespace. UA implementations for recognizers that supply EMMA must pass that EMMA structure directly. I have mixed feelings about whether EMMA is appropriate for this specification. Arguing against, the EMMA specification is fairly large and rather complex which may adversely impact the usability of the Speech API for many web application developers. Arguing in favor, EMMA provides a nice framework for representing complex semantic results and their derivations through multiple engines. I have read the arguments on the list and am encouraged that the consensus has favored the inclusion of EMMA. At the same time, I hope that future drafts of the Speech API or of supporting documents will help clarify how to user results are represented in EMMA. I see that Milan has offered a few possibilities for future consideration, but I do not believe these are sufficient. The second sentence is troublesome. I do not see any reason that the UA would need to pass EMMA results directly. In fact, doing so runs counter to the original intent of the EMMA specification. As my co-editor explained in an earlier post [1]: I'm not sure why a web developer would care whether the EMMA they get from the UA is exactly what the speech recognizer supplied. On the other hand, I can think of useful things that the UA could add to the EMMA, for example, something in the <info> tag about the UA that the request originated from, that the recognizer wouldn't necessarily know about. In that case you might actually want modified EMMA. One recurring implementation strategy that I have seen for mobile devices is to combine local signal processing resources with cloud based ones. Here the result of a recognition would necessarily combine information from the two different resources and it would be inappropriate to return the EMMA result from a single resource. Much better from the perspective of EMMA would be to build a composite result with separate derivation chains. [Debbie, I know you later said that a direct result would be okay [2] but you may have been thinking of a simpler architecture.] Thanks for the discussion to date and for the first draft. -=- Jerry [1] http://lists.w3.org/Archives/Public/public-speech-api/2012Jun/0056.html [2] http://lists.w3.org/Archives/Public/public-speech-api/2012Jun/0059.html
Received on Thursday, 14 June 2012 13:13:47 UTC