- From: Olli Pettay <Olli.Pettay@helsinki.fi>
- Date: Tue, 08 Mar 2011 22:03:09 +0200
- To: Robert Brown <Robert.Brown@microsoft.com>
- CC: "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
On 03/01/2011 07:11 PM, Robert Brown wrote: > Hi Everyone, > > Our proposal is posted here: > http://lists.w3.org/Archives/Public/www-archive/2011Mar/att-0001/microsoft-api-draft-final.html > > > It proposes some extensions to existing APIs, as well as some > speech-specific objects, and some speech-specific HTML. > > Cheers, > > /Rob > Some comments. Chapter 6. Have you investigated if HTML <device> could be used, instead of Capture API? There is a possible practical/political problem using Capture API; it is a draft from DAP WG which major browser vendors, except Opera, have left. (I don't recall all the reasons why that happened last year.) It might be more flexible to use WebSockets than XHR. I doubt send(in Stream) will be ever accepted to XHR. XHR is getting rather complicated API even without streaming, and WebSocket is all about Streaming. The proposed change to XHR+multipart isn't what is implemented today in some browsers. In general, I'd prefer some simpler (for web developers) solution than Capture API + XHR. Also, this approach would require major changes to other APIs. And the approach doesn't allow local speech engines. Chapter 7. I like GrammarCollection. (minor nit, for consistency with other web API, it should have length, not count.) Nit, SetInputDevice -> setInputDevice Is it possible to use SpeechRecognizer without CaptureAPI? Why recognizer is set in SpeechRecognizer constructor, but capture device needs a separate method? Nit, event listener attributes should be attribute Function onfoo, not onfoo() Not very surprisingly, in general I like SpeechRecognizer approach. Using XHR could be removed, and Capture API and remote speech services could be supported in a v2 (assuming Capture API is even close to stable). Chapter 7.4 I think I prefer Björn's approach to reuse <audio> for tts, especially because SpeechSynthesizer looks a lot like the API for <audio>. Or alternatively <tts> from 8.2. Chapter 8 Assuming <reco> doesn't have any visual representation, I think I could prefer that approach. Especially if the API is simplified a bit (maybe remove .capture and SetSpeechService from v1 ). In a way it is very close to SpeechRequest, which has aBoundElement as a parameter to constructor. <reco>'s "bound element" would be the parent element. Although, there is a parsing problem. IIRC HTML5 parser does not ever create a child element for <input> element. So, the 'for' attribute approach would work better. Chapter 8.2 HTMLTTSElement could perhaps extend HTMLAudioElement, or at least HTMLMediaElement. So far SpeechRequest/<reco> for recognition and <tts> (either MS's or Google's) for tts looks most promising to me. I haven't yet read Tropo document properly, but it doesn't "feel" very webby, and uses terms like Event in a different way than web specs. But I'll comment it more later. -Olli
Received on Tuesday, 8 March 2011 20:03:43 UTC