RE: Speech API breakdown of work items

I see that one advantage of being (I think) 1-10 hours ahead of everyone
else (I’m in Kenya) is that I may be getting the first crack at the list,
although I may have to work fast to beat Olli :-).
 I think that I would be interested in working on 4 and 5, and possibly 13
and 15. Also, I think it would be easier for me at least to propose the text
first that describes the encapsulated semantics, get rough agreement on
that, and then do the IDL. I would be happy to work with anyone else who was
interested in those topics as well.
I wasn’t sure where the actual process of initiating recognition and
synthesis fits into this list. Some property setting could be done at
initiation, like including the grammar as a parameter of an initiation
request, so in that sense initiation might fit into 4, but some properties
could also be set up ahead of time in separate API calls, which would make
initiation a separate API call.

From: public-xg-htmlspeech-request@w3.org
[mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Michael Bodell
Sent: Wednesday, June 15, 2011 7:32 PM
To: Bjorn Bringert (bringert@google.com); Dan Burnett (dburnett@voxeo.com);
Deborah Dahl (dahl@conversational-technologies.com); Olli.Pettay@gmail.com;
Michael Bodell; Charles Hemphill; dd5826@att.com; Raj (Openstream)
(raj@openstream.com)
Cc: public-xg-htmlspeech@w3.org
Subject: Speech API breakdown of work items

So from our last call Debbie suggested we break down the API work items
which would allow folks to volunteer for what parts they wanted to work on
(we’ll need everyone to do 2 of these each, on average).  As we discussed in
the call, some of this might need to wait for the requirement and design
decision work that is ongoing, but much of it could be covered with the
existing discussions that we’ve had on the calls, over email, and at the
face-to-face.  Here is my breakdown (not in any special order) of the basic
things that we have to do, and the couple of people who have volunteered so
far.  Everything except the requirements and design decision needs both the
IDL API outline *and* text describing the details of the semantics that are
encapsulated.  If the people on the to line could reply on which sections
they want to take first, that would help (and anyone else in the group who
wants to jump in would also be welcomed).  Also, if anyone thinks of a major
section I’ve omitted, please reply and add it.

1. Go through the design decisions and requirements and flag and organize
the requirements and design decisions that relate to API.   Similar to what
Marc did for Protocol.  Raj volunteered for this task on the last call.
2. The markup associated with the recognition element and any associated
properties and api (I.e., the element, for label, implied behavior).  This
is still controversial, but should be orthogonal from the rest of API.
3. The API hooks relating to “setting up” or “preparing” or “checking the
capabilities” of a request (both recognition and TTS).  Dan Druita
volunteered for this task on the last call.
4. The API hooks for specifying grammars and also other recognition
properties (both what these properties are, and how to specify them).  We
covered some of this at the F2F.
5. The API hooks for getting speech results back (both the EMMA XML and text
representation that was in a couple of proposals and Bjorn outlined, but
also the continuous results that we talked about at the F2F – this also
possibly covers feedback functionality).
6. The recognition events that are raised and the associated handlers and
data (including any semantics about time stamps and other related
information we covered at the F2F).
7. The API hooks related to the protocol for both speech and synthesis (both
what speech service to use, and also anything else the protocol team
identifies as a need).  This might have to wait until the protocol is
further along (and might also be something someone on the protocol team
wants to take).
8. The API hooks related to hooking up with the capture system.
9. The API hooks associated with actually doing the recognition.  (This may,
or may not, be different than a combination of 3 and 4 above).
10. The API hooks related to actually doing a synthesis transaction.
11. The synthesis events that are raised and the associated handlers and
data (same caveat about timing as with 6).
12. The API hooks for controlling synthesis, if any (pause, resume, play,
etc.).
13. The API to do text based recognition.  We covered this some at the F2F.
14. The API to do a combination of bargeable synthesis and recognition. 
This was a little controversial, but we discussed it at the F2F.
15. The API hooks to do continuous recognition (both open microphone as well
as dictation).  This was covered some at the F2F and may just be part of 3,
4, and 9 above.

Received on Thursday, 16 June 2011 06:42:03 UTC