RE: Speech API breakdown of work items from Michael Bodell on 2011-06-18 (public-xg-htmlspeech@w3.org from June 2011)

From: Michael Bodell <mbodell@microsoft.com>
Date: Sat, 18 Jun 2011 23:14:47 +0000
To: Deborah Dahl <dahl@conversational-technologies.com>, 'Bjorn Bringert' <bringert@google.com>, 'Dan Burnett' <dburnett@voxeo.com>, "Olli.Pettay@gmail.com" <Olli.Pettay@gmail.com>, 'Charles Hemphill' <charles@everspeech.com>, "dd5826@att.com" <dd5826@att.com>, "'Raj (Openstream)'" <raj@openstream.com>
CC: "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
Message-ID: <22CD592CCD76414085591204EB19F4E81D8C12C9@TK5EX14MBXC262.redmond.corp.microsoft.>

Alright, Debbie has claimed 4 and 5 and has the inside track on 13 and 15.  I'll assign people on the to list (other than Raj, Dan, and Debbie who already have assigned work) sections to do on Tuesday if people haven't already volunteered.  We have no call this coming week (with MMI and VB meeting), but we have a call the following week on Speech API.  So the due date for especially the design decisions and requirements that Raj is on the hook for, but also a first draft of these other sections will be before that week's call.

-----Original Message-----
From: Deborah Dahl [mailto:dahl@conversational-technologies.com] 
Sent: Wednesday, June 15, 2011 11:41 PM
To: Michael Bodell; 'Bjorn Bringert'; 'Dan Burnett'; Olli.Pettay@gmail.com; 'Charles Hemphill'; dd5826@att.com; 'Raj (Openstream)'
Cc: public-xg-htmlspeech@w3.org
Subject: RE: Speech API breakdown of work items

I see that one advantage of being (I think) 1-10 hours ahead of everyone else (I'm in Kenya) is that I may be getting the first crack at the list, although I may have to work fast to beat Olli :-).
 I think that I would be interested in working on 4 and 5, and possibly 13 and 15. Also, I think it would be easier for me at least to propose the text first that describes the encapsulated semantics, get rough agreement on that, and then do the IDL. I would be happy to work with anyone else who was interested in those topics as well.
I wasn't sure where the actual process of initiating recognition and synthesis fits into this list. Some property setting could be done at initiation, like including the grammar as a parameter of an initiation request, so in that sense initiation might fit into 4, but some properties could also be set up ahead of time in separate API calls, which would make initiation a separate API call.

From: public-xg-htmlspeech-request@w3.org
[mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Michael Bodell
Sent: Wednesday, June 15, 2011 7:32 PM
To: Bjorn Bringert (bringert@google.com); Dan Burnett (dburnett@voxeo.com); Deborah Dahl (dahl@conversational-technologies.com); Olli.Pettay@gmail.com; Michael Bodell; Charles Hemphill; dd5826@att.com; Raj (Openstream)
(raj@openstream.com)
Cc: public-xg-htmlspeech@w3.org
Subject: Speech API breakdown of work items

So from our last call Debbie suggested we break down the API work items which would allow folks to volunteer for what parts they wanted to work on (we'll need everyone to do 2 of these each, on average).  As we discussed in the call, some of this might need to wait for the requirement and design decision work that is ongoing, but much of it could be covered with the existing discussions that we've had on the calls, over email, and at the face-to-face.  Here is my breakdown (not in any special order) of the basic things that we have to do, and the couple of people who have volunteered so far.  Everything except the requirements and design decision needs both the IDL API outline *and* text describing the details of the semantics that are encapsulated.  If the people on the to line could reply on which sections they want to take first, that would help (and anyone else in the group who wants to jump in would also be welcomed).  Also, if anyone thinks of a major section I've omitted, please reply and add it.

1. Go through the design decisions and requirements and flag and organize the requirements and design decisions that relate to API.   Similar to what Marc did for Protocol.  Raj volunteered for this task on the last call.
2. The markup associated with the recognition element and any associated properties and api (I.e., the element, for label, implied behavior).  This is still controversial, but should be orthogonal from the rest of API.
3. The API hooks relating to "setting up" or "preparing" or "checking the capabilities" of a request (both recognition and TTS).  Dan Druita volunteered for this task on the last call.
4. The API hooks for specifying grammars and also other recognition properties (both what these properties are, and how to specify them).  We covered some of this at the F2F.
5. The API hooks for getting speech results back (both the EMMA XML and text representation that was in a couple of proposals and Bjorn outlined, but also the continuous results that we talked about at the F2F - this also possibly covers feedback functionality).
6. The recognition events that are raised and the associated handlers and data (including any semantics about time stamps and other related information we covered at the F2F).
7. The API hooks related to the protocol for both speech and synthesis (both what speech service to use, and also anything else the protocol team identifies as a need).  This might have to wait until the protocol is further along (and might also be something someone on the protocol team wants to take).
8. The API hooks related to hooking up with the capture system.
9. The API hooks associated with actually doing the recognition.  (This may, or may not, be different than a combination of 3 and 4 above).
10. The API hooks related to actually doing a synthesis transaction.
11. The synthesis events that are raised and the associated handlers and data (same caveat about timing as with 6).
12. The API hooks for controlling synthesis, if any (pause, resume, play, etc.).
13. The API to do text based recognition.  We covered this some at the F2F.
14. The API to do a combination of bargeable synthesis and recognition. This was a little controversial, but we discussed it at the F2F.
15. The API hooks to do continuous recognition (both open microphone as well as dictation).  This was covered some at the F2F and may just be part of 3, 4, and 9 above.

Received on Saturday, 18 June 2011 23:15:24 UTC