- From: Jim Barnett <Jim.Barnett@genesyslab.com>
- Date: Fri, 20 Apr 2012 12:36:27 -0700
- To: <public-speech-api@w3.org>
- Message-ID: <E17CAD772E76C742B645BD4DC602CD810616E7F7@NAHALD.us.int.genesyslab.com>
A couple of quick comments on remote resources: the current proposal is to add a URL property to asr/tts resources, identifying the location of the remote resources. If we want to pursue this path, we would have to define the protocol for the browser to connect to the resources, to send/receive audio, to get results, etc. It will certainly be easier if we can re-use an existing protocol. The obvious one that comes to mind is the one that the rtcWeb group is defining (it's a joint effort of the IETF and the W3C). rtcWeb is intended to allow browser-to-browser voice, video, and data communications. There's a lot of interest in this group, including participation from major browser vendors, so there's a good chance that at some point we will see this capability built into the popular web browsers. Overall, it will provide a superset of what we need (as far as I know, we don't need video), so it would make sense for us to reuse it, rather than asking browser vendors to support an additional protocol. (The fact that we will have media servers at one end of the call, rather than a second browser is not a problem - as long as the media server speaks the appropriate protocol, the user's browser will never know the difference.) There is a slight complication, though. In the current draft of the IETF spec, the browser does not have the capability to set up a call on its own - the call must be set up in Java Script (this is to allow flexibility in complex situations.) However, as part of call set up, a PeerConnection object is created, which will contain one or more media streams. (See the draft API at http://www.w3.org/TR/2012/WD-webrtc-20120209/ . For the call set up protocol, see http://datatracker.ietf.org/doc/draft-ietf-rtcweb-jsep/ Be aware that these are both working drafts.) So I think it would make sense for our API to allow the developer to provide a PeerConnection object when creating an ASR or TTS resource. If such an object is provided, the browser must use it to communicate with the remote resources. Otherwise it will use its local defaults. The ASR and TTS resources would each receive their own PeerConnection (or one could receive a PeerConnection and the other use the browser defaults.) Each PeerConnection should contain an audio stream and a data channel (for TTS, the data channel is used to pass the text to play to the resource; for ASR, the data channel is used to return results.) There will be a bunch of error cases to consider (what if the PeerConnection lacks a data channel, or has two audio channels, etc.) I would think that in most of these cases the browser should signal an error and reject any subsequent attempt to use the relevant resource. On the whole, the fact that rtcWeb requires that the JS author set up the call will make programming remote resources more complex, but also much more flexible, than if we counted on the browser to do the job. But I don't think it adds any complexity to our job of standard definition. We just specify that the optional parameter is PeerConnection, rather than URI, and the rest of the spec doesn't have to change much (we'd have to handle various error cases in the uri-based version of the API as well.) Anyhow, this subject will need more discussion, but I'd like to get started on it soon. - Jim
Received on Friday, 20 April 2012 19:37:01 UTC