W3C home > Mailing lists > Public > public-xg-htmlspeech@w3.org > October 2011

[minutes] 27 October 2011

From: Dan Burnett <dburnett@voxeo.com>
Date: Thu, 27 Oct 2011 13:35:40 -0400
Message-Id: <BB01D751-4BC9-4D71-91DB-17D29BE030DB@voxeo.com>
To: public-xg-htmlspeech@w3.org
Group,

The minutes from today's call are available at http://www.w3.org/2011/10/27-htmlspeech-minutes.html

For convenience, a text version is embedded below.

Thanks to Debbie Dahl for taking the minutes.

-- dan

**********************************************************************************
              HTML Speech Incubator Group Teleconference

27 Oct 2011

   [2]Agenda

      [2] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/0058.html

   See also: [3]IRC log

      [3] http://www.w3.org/2011/10/27-htmlspeech-irc

Attendees

   Present
          Dan_Burnett, Olli_Pettay, Milan_Young, Debbie_Dahl,
          Michael_Bodell, Dan_Druta, Charles_Hemphill, Glen_Shires,
          Robert_Brown, Michael_Johnston

   Regrets
   Chair
          Dan_Burnett

   Scribe
          ddahl

Contents

     * [4]Topics
         1. [5]protocol questions
         2. [6]f2f planning
         3. [7]questions on the protocol
         4. [8]EMMA with JSON payload
         5. [9]<reco>
     * [10]Summary of Action Items
     _________________________________________________________


protocol questions

   <burn> Document is
   [12]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct
   /0033.html

     [12] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/0033.html

   dan: let's postpone this

f2f planning

   dan: who will be at f2f?

   olli: can we call in?

   dan: didn't ask for a phone, usually hard to talk to someone on the
   phone at f2f

   <mbodell> I know Robert will be there and so will Avery (a person
   from Microsoft who is starting to track these issues)

   dan: will look into one-way audio stream
   ... topics that people want to discuss?

   glen: won't that be close to our last chance?

   dan: we need to be done by the end of November

   glen: we won't have much time after f2f, only 1-2 calls

   michael: we should not expect to have any substantive discussions
   after f2f
   ... only editorial

   dan: after f2f, anything that we don't agree on, we have to stop
   work on
   ... editorial work can be substantial, too

   glen: should go into f2f with prioritized list of issues that we
   want to resolve

   michael: try to raise open issues on email, can people also write up
   code examples, also want to make sure that we're handling use cases

   dan: get people to sign up for sample code, even if not coming to
   f2f

   glen: what sample code should we be completing? is there a list?

   <mbodell>
   [13]http://www.w3.org/2005/Incubator/htmlspeech/live/requirements.ht
   ml#section-use-cases

     [13] http://www.w3.org/2005/Incubator/htmlspeech/live/requirements.html#section-use-cases

   michael: we don't have a list, but can work through use cases to
   generate a list

   glen: we should have priorities and examples ahead of time

   dan: we know what needs to happen, now we need to get people to sign
   up to do things.

   glen: will sign up to do some sample code

   michael: about seven people here who will be at f2f

   registrants --
   [14]http://www.w3.org/2002/09/wbs/35125/TPAC2011/registrants#HTMLSpe
   ech

     [14] http://www.w3.org/2002/09/wbs/35125/TPAC2011/registrants#HTMLSpeech

   glen: need quality use cases

   danD: from a developer's perspective, I would like to see some real
   examples that allow me to accomplish a particular task
   ... e.g. voice search, set up a service that isn't a default service
   ... for example, a speech recognition service

   glen: specifying a speech service is a good idea for an example.
   ... some use cases span the gamut, that might require a huge
   JavaScript effort

   danD: to show developers that this is real, we need to address
   immediate needs. we might not have the resources to fully accomplish
   this, but we should have a few examples

   glen: I'm willing to take a crack at many of these

   michael: some are pretty extensive, everyone should prepare samples
   for using the protocol and for using the WebAPI.
   ... it doesn't hurt if there is some duplication, but would like to
   have coverage of many use cases

   dan: do we need to make this more precise?

   glen: no suggestions for making this more precise

   michael: people should check with others if they also plan to do
   some

   olli: will try something for permission handling

   milan: would like to do something about continuous dictation in the
   protocol
   ... would do the full stack

   glen: will focus on the WebAPI, not the protocol

   debbie: will do use case 5, Domain Specific Grammars Filling
   Multiple Input Fields

   glen: what is the protocol aspect of that?

   michael: the author doesn't have to get into that but we have to
   specify what goes into the protocol to accomplish the use case.

   danD: could give a summarized description of what the connection is
   between the WebAPI and the protocol, could go back to the
   architecture and describe the bits and pieces we've put together
   over the past year. I can describe the architecture visually and in
   words.

   michaelJ: will try to do something around multimodal interaction

   dan: if you can't send sample before f2f, there probably won't be a
   chance to discuss it.
   ... this is an important deadline

   charles: will review and provide feedback on other contributions,
   will take a look at TTS but can't promise

   <MJ>
   [15]http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech
   .html#use-cases

     [15] http://www.w3.org/2005/Incubator/htmlspeech/live/NOTE-htmlspeech.html#use-cases

   robert: driving directions, u15, rerecognition
   ... will look at 3.3.3

   dan: could we find something for Bjorn and Satish?

   <robert> i'll look at 3.3.3, 3.3.7 and 3.3.15. can't promise quality

   glen: will encourage them to do what they can

   michael: can do a quick example on speech translation, both API and
   protocol

   dan: might do an example of interpret from text, but may not get
   that done
   ... will primarily work on compiling the report together

   michaelJ: did we end up having the ability to put the grammar
   inline?

   michael: not currently, but we talked about using a data scheme in
   the URI

   charles: we should have an example showing that
   ... can volunteer to provide that

   michael: please send any substantive issues to the list in advance
   of the meeting.

questions on the protocol

   <mbodell> For the data scheme if people need reminders on how it
   works the wikipedia page at
   [16]http://en.wikipedia.org/wiki/Data_URI_scheme describes it

     [16] http://en.wikipedia.org/wiki/Data_URI_scheme

   robert: the first question is whether we would ever allow unencryted
   transmission
   ... I think TLS encryption should be optional

   olli: if there's a proxy, the proxy must not be able to read the
   transmission

   michael: the user should know if the speech is happening over a
   secure channel or not
   ... i don't know if it needs to be required for that

   robert: if the page was fetched over TLS, would expect speech to be
   handled over TLS

   dan: the security of the speech should be at least as strong as the
   security of the page

   michael: the page should tell you what's secure and what's not

   olli: this is a new kind of data, speech is more private

   robert: what do current services use?

   michael: Bing uses both

   glen: I don't know about Google Voice Search

   <smaug> what...

   robert: we should say that browsers have a strict policy about this,
   but it's not clear that we should disallow unencrypted transmission

   dan: in MRCP it was useful to talk about the idea of a controlled
   environment
   ... e.g. if the components are located on the same machine with no
   external network

   robert: there are probably trivial applications where I'm not saying
   anything that's personally identifiable.

   dan: could conceivably capture enough of your voice to train a TTS

   michael: this is just about informed user consent

   <burn> got dropped. was saying this is indeed different from mrcp
   where the user is not involved

   michael: people are putting their voices up in YouTube all the time

   charles: people can restrict who can see their YouTubes
   ... people might assume that a commercial service is secure

   robert: there are a lot of policy issues that depend on what country
   you're in, for example

   glen: if you're jumping from one speech engine to another with
   different policies, it gets complicated, because the user might not
   know about it

   robert: is there a strong case for disallowing unencrypted
   transmission?

   glen: we had a discussion on how the user authorizes what speech
   engines are used

   olli: there could be a proxy that recognizes you or other things
   like your gender from your voice

   dan: don't see any reason to disallow unencrypted speech

   michael: could discuss what happens when you're loaded securely and
   then Javascript tries to do something insecure

   olli: is there any reason to allow unencrypted speech?

   dan: we never know how our technologies are used you can't assume
   that there's always a person at the client, or you can't assume that
   the client and server are on different networks
   ... there could be significance performance implications from
   encryption

   olli: there could be an additional spec for more controlled
   environments

   dan: wouldn't have a problem with always encrypting

   robert: there has to be a consent UI to even send your voice to a
   service

   olli: what happens to the data between the client and the service,
   there could be any number of proxies in between.

   robert: the concern is about man in the middle attacks.
   ... your server could disallow non-TLS connections

   olli: but spec needs to be interoperable

   robert: TLS is required in UA's because of man in the middle
   attacks, but could be optional in the other cases. we could say that
   between the browser and the server TLS is required.

   michael: we should try to be consistent with other API's

   olli: but this is a different kind of data. we could look at RTC,
   for example.

   dan: there is a requirement for support of TLS, but I don't know if
   that's mandatory.

   <mbodell> There is text in html for fetching at
   [17]http://dev.w3.org/html5/spec/fetching-resources.html#fetch and
   it talks about various things (including same origin, and possilby
   CORS) but I don't see where it says things need to be secure, even
   when on a secure page

     [17] http://dev.w3.org/html5/spec/fetching-resources.html#fetch

   olli: in that case the UA can decide, but our situation is different

   dan: will look for RTC info offline

   robert: voice data is sensitive, and people don't realize that just
   because they're talking to their browser they might be vulnerable
   ... however, other services might not be affected
   ... once you've given the data to a service, it can do whatever it
   likes with it

   danD: it could use dedicated media transport and might not need TLS

   dan: it's outside our scope.

   danD: as a user, you trust the service that you're using

EMMA with JSON payload

   robert: in EMMA you can return pretty much whatever you like, JSON
   seemed like a good example, but we had decided not to use JSON

   michael: it's ok to pull it out, the new examples will give a better
   sense of what you can do

   michaelJ: i'm fine with that, you can do that with 1.0, in EMMA 1.1
   you can specify the type of payload.
   ... you can put all kinds of information in EMMA, for example,
   emotional state
   ... the use case I'm most interested in is "send info". what does
   the EMMA coming back look like?
   ... if you want something outside of the API, you can go into the
   EMMA to get it.

   robert: will pull the example.
   ... posted an update last week, won't plan to do another draft
   ... comment if you have suggestions

<reco>

   dan: are we close to having a consensus?

   michael: I think glen and I are close, not sure about everyone else

   dan: let's summarize what it means to be close to an agreement

   <mbodell>
   [18]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct
   /0060.html

     [18] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/0060.html

   <mbodell>
   [19]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct
   /0048.html

     [19] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/0048.html

   michael: made changes to the WebAPI document, sent around, topic of
   binding might be too dense for now

   <mbodell>
   [20]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct
   /0055.html

     [20] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/0055.html

   michael: if people have questions we could probably take a look at
   those.

   <mbodell> Those links are examples from me, Glen, and charles
   respectively

   dan: won't have a chance to pull things together until Sunday, so
   Robert's final updates to the protocol can be sent until then

   robert: will do a quick update including today's discussion

   michaelJ: do we have any js examples for current API spec?

   michael: we have some simple examples for the markup, but not API
   ... could try to write up a quick example that we could start from,
   will add an API example to section 1 today or tomorrow.

   olli: needs to reread binding stuff
Received on Thursday, 27 October 2011 17:36:14 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 27 October 2011 17:36:18 GMT