W3C home > Mailing lists > Public > public-xg-htmlspeech@w3.org > September 2011

RE: [agenda] 8 September 2011

From: Deborah Dahl <dahl@conversational-technologies.com>
Date: Thu, 15 Sep 2011 10:53:23 -0400
To: "'Satish S'" <satish@google.com>, <olli@pettay.fi>
Cc: "'Michael Bodell'" <mbodell@microsoft.com>, <public-xg-htmlspeech@w3.org>
Message-ID: <079001cc73b7$389881f0$a9c985d0$@conversational-technologies.com>
I don't know if this proposal is simpler, but there are aspects of it that
might be convenient and help in debugging. 

1.       Setting up the SpeechInputRequest seems to involve about the same
amount of complexity in both proposals. Let's say that the developer has a
set of five grammars that they want to use in the application. In the
current proposal, the developer would call the "addGrammar" method five
times to add it to the SpeechInputRequest. In Satish's proposal, the
developer would create five Grammar objects and add them all at once when
the SpeechInputRequest is created, which seems like roughly the same amount
of complexity. In Satish's proposal there is an additional Grammar object
that developers have to learn about and manage, but it's not a very complex

2.       However, later on, if the developer wants to add or remove one or
more grammars, in the current proposal you would call addGrammar() or
disableGrammar(), possibly multiple times, if the change involved multiple
grammars. In Satish's proposal, if I understand it, I think you would just
set the grammars attribute of the SpeechInputRequest with the desired set of
grammars.  I think this would be easier to debug because the full set of
active grammars would be explicitly set, as opposed to having to trace back
through the sequence of "addGrammar" and "disableGrammar" calls to figure
out if something was active that shouldn't have been and vice versa. On the
other hand it might be annoying to have to respecify the full list of N
grammars every time you just want to change one of them. 

3.       If the developer has a single grammar that they want to use in
different SpeechInputRequests, it would be convenient to be able to reuse
the same Grammar object several times for different SpeechInputRequests.

4.       I think if the developer wants a modal grammar, they would just
make it the only grammar in the sequence of active grammars.

5.       One question I have, though, is what happens if the actual grammar
at the src uri changes after the Grammar object is created? If the developer
wants to insure that they're using a completely up to date grammar do they
have always have to remember to create a new Grammar object right before
they add it to the SpeechInputRequest?

From: public-xg-htmlspeech-request@w3.org
[mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Satish S
Sent: Tuesday, September 13, 2011 6:58 AM
To: olli@pettay.fi
Cc: Michael Bodell; public-xg-htmlspeech@w3.org
Subject: Re: [agenda] 8 September 2011


I also notice we started discussing Section 7 and covered some parts like
grammars. Currently we have 4 methods for manipulating the list of grammars
which seems complicated.


A simpler way would be to define a Grammar interface and have a sequence of
them as an atttibute of SpeechInputRequest


  interface SpeechInputGrammar {

    attribute DOMString src;

    attribute float weight;

    attribute boolean modal;  (Does this make sense or can the webapp just
remove the others in the grammar list below?)


  interface SpeechInputRequest {


    attribute sequence<SpeechInputGrammar> grammars;




On Tue, Sep 13, 2011 at 11:48 AM, Satish S <satish@google.com> wrote:

Sorry I couldn't attend the call last week as I was on leave. I see in the
minutes that Olli's first point was discussed briefly, about automatic
binding to various types of html elements. But it doesn't look like we had a
satisfactory conclusion.


I am wondering if we really need automatic binding to existing elements or
can <reco> just be a standalone new UI element. The main reason I support a
<reco> element is for user initiated recognition (without automatically
throwing up security prompts on page load). This doesn't require automatic
binding and if the <reco> element was just aimed at getting user consent,
start recognition and return results to the JS event handler that would
support the use case of user initiated recognition.


On Thu, Sep 8, 2011 at 11:03 AM, Olli Pettay <Olli.Pettay@helsinki.fi>

Few comments.

"Some elements are catagorized as recoable elements. These are elements that
can be associated with a reco element:

   input (if the type attribute is not in the Hidden state)

This is not enough, and not precise enough.
How should we handle contentEditable?
I'm also pretty sure we don't want to set the
*value* of <input type="checkbox"> but the state etc.
Also, why not set the value of <input type="hidden"> ?
(These are the kind of  problems which make the API inconsistent and
 why I wouldn't have the automatic value binding to HTML elements.)

"The reco element's exact default presentation and behavior, in particular
what its activation behavior"
We need to still figure out some permission API.
User must give permission in some way, and Web app needs to probably
know about user's decision so that if user decides not to ever give
permission, web app can hide the UI related to speech handling.

" might be and what implicit grammars might be defined, if any, is
unspecified and user agent specific. The activation behavior of a reco
element for events targetted at interactive content descendants of a reco
element, and any descendants of those interactive content descendants, MUST
be to do nothing. When a reco element with a reco control is activated and
gets a reco result, the default action of the recognition event SHOULD be to
set the value of the reco control to the top n-best interpretation of the
recognition (in the case of single recognition) or an appended latest top
n-best interpretation (in the case of dictation mode with multiple inputs)."

I don't understand the "SHOULD" part. If we want to support automatic
value binding, UA implementing the API must set the value of reco control,
if there is one.

On 09/08/2011 11:48 AM, Michael Bodell wrote:

A number of folks may be out, but it will be good to get through the
rest of the API document on the call.  I've attached a new version of
the file that incorporates most of the information from the last Web
API call.  Last time we only just started on Section 6.  We should
start there and finish the document and then continue the discussion
on results formats.  As a reminder the plan is that we get more
concrete and knock out holes in the Web API in this document, then
rationalize it with the protocol work, and then fold both in to the
final group report.  We are mostly still on that first step.

Here is my summary of what has changed from the last document, based
on the minutes of the last meeting:


changes to section 3: remove bind and unbind

changes to section 3: add a method to
createSpeech[Input|Output]Request on the SpeechRequest interface

changes to section 3: change the type enum to be a bitfield so TTS is
1, ASR is 2, and don't need TTSASR if you have TTS | ASR

issue to section 3: remove the state

possible issue to 3: we can have multiple of these, should state
that, they go away with garbage collection, only issue is how long
does the service stay open after a query, do we need some explicit
close/reattach (bind/unbind) or do we just not care...

changes to section 4: need Query to be more specific if this is on

changes to section 4: merge filter and options (the criteria is in
the option, probably as a flat list)

issues to section 4: query needs to be async

changes to section 4: add successCallBack and failureCallBack to the
specific speechQuery function

changes to section 5: Need better definition of "recoable elements",
probably listing all such elements

issue on section 5: Need a way to get at the SpeechRequest associated
with the reco element

issue on section 5: have a SpeechInputRequest attribute of reco that
is the tied request... this could have the default UA service or a
service based on URI attribute. From scripting if you get a new SIR
you can set the attribute to associate the new SIR with this reco

issue on section 6: same kind of idea with section 5, with a
SpeechOutputRequest instead of SpeechInputRequest issue for section 6
(and generally): Link to the definitions in HTML 5 (for
HTMLMediaElement, but also for "potentially playing", etc.)

Notes about: issue on section 3 or 4: possibly need a way to check if
you have permission to do recognition, method on Service (or on

issues to section 4: need a function to return the service based on
the URI for the service

changes to section 4: filter on required languages and required
service type and possibly other things...

changes to section 4: possible filter on audio codecs as well

issues to section 4: need a way to do authorization, possibly as a
name/password options in query options, possibly as authorization for
an authorization header, possibly just as proprietary stuff on the

Not done/no changes: issue to section 4: maybe have constructor and
set things?

changes to section 5: possible remove form attribute or control...
possibly not since this really does tie it to the element properly
section 5: htmlFor is definitely fine, later discussion sounds like
control is fine too

issue to section 5: need some sentence about security model, but that
probably ties to request object and not the reco element

possible issue for section 6: think about CSS audio last call and if
it effects section 6

________________________________________ From:
[public-xg-htmlspeech-request@w3.org] on behalf of Young, Milan
[Milan.Young@nuance.com] Sent: Wednesday, September 07, 2011 4:03 PM
To: Dan Burnett; public-xg-htmlspeech@w3.org Subject: RE: [agenda] 8
September 2011

I also need to send my regrets for this week.

-----Original Message----- From: public-xg-htmlspeech-request@w3.org
[mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Dan
Burnett Sent: Wednesday, September 07, 2011 11:45 AM To:
public-xg-htmlspeech@w3.org Subject: [agenda] 8 September 2011

We will have a teleconference this week as planned.


1. Web API discussion

Since neither Bjorn nor I expects to be able to attend this week's
call, we will not be going through any more issue topics in the draft
report this week.

I propose that we have the following meetings be focused on the
listed topics: 15 September:  Web API discussion for 60 minutes, then
Protocol discussion or outstanding topics, whichever is most needed.
22 September:  Protocol wrap up for 30 minutes, then Web API
discussion 29 September:  Web API wrap up

-- dan

============== ==Telecon info == ==============

Date:  Thursday, 8 September 2011 Time:  Noon (New York), 1800
(Central Europe), 0100 (Tokyo) Duration:  90 minutes

US telephone number:  +1.617.761.6200 France telephone number:
+ UK telephone number: +44.203.318.0479

Conference code:  48657# (HTMLS#)

Info on using Zakim:  http://www.w3.org/2002/01/UsingZakim Irc
channel:  #htmlspeech

=================== = Recent minute-takers = =================== 1
September:  Glen Shires 4 August:  Robert Brown 28 July:  Milan
Young 7 July:  Dan Burnett 30 June:  Debbie Dahl 16 June:  Patrick
Ehlen 9 June:  Raj Tumuluri 2 June:  Michael Johnston 19 May:
Michael Bodell 12 May:  Dan Druta 5 May:  Charles Hemphill 28 April:
Robert Brown 21 April:  Olli Pettay 14 April:  Milan Young 7 April:
Debbie Dahl 17 March: Dan Burnett 17 February: Bjorn Bringert 16
December: Robert Brown 9 December: Dan Druta 2 December: Raj
Tumuluri 18 November: Milan Young, Dan Burnett 11 November: Debbie



Received on Thursday, 15 September 2011 14:54:26 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:16:50 UTC