[minutes] draft minutes for 16 December telecon from Dan Burnett on 2010-12-17 (public-xg-htmlspeech@w3.org from December 2010)

From: Dan Burnett <dburnett@voxeo.com>
Date: Fri, 17 Dec 2010 07:24:23 -0500
To: public-xg-htmlspeech@w3.org
Message-Id: <822B4664-6367-4020-905F-4815E94F6396@voxeo.com>
Group,

The draft minutes are available for review at http://www.w3.org/2010/12/16-htmlspeech-minutes.html 
.

For convenience, a text version is included below.  Please send any  
comments or corrections by email.

-- dan


                                - DRAFT -

               HTML Speech Incubator Group Teleconference

16 Dec 2010

    [2]Agenda

       [2] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Dec/0144.html

    See also: [3]IRC log

       [3] http://www.w3.org/2010/12/16-htmlspeech-irc

Attendees

    Present
           Michael_Bodell, Olli_Pettay, Milan_Young, Bjorn_Bringert,
           Dan_Burnett, Debbie_Dahl, Robert_Brown, Marc_Schroeder,
           +1.732.507.aabb

    Regrets
    Chair
           Dan Burnett

    Scribe
           Robert_Brown

Contents

      * [4]Topics
          1. [5]last week's minutes
          2. [6]comments on the newest version of the requirements draft
          3. [7]require encryption
             http://lists.w3.org/Archives/Public/public-xg-htmlspeech/20
             10Dec/0099.html
          4. [8]require best practices
             http://lists.w3.org/Archives/Public/public-xg-htmlspeech/20
             10Dec/0107.html
          5. [9]require support for text interpretation
             http://lists.w3.org/Archives/Public/public-xg-htmlspeech/20
             10Dec/0122.html
          6. [10]re-recognition
             http://lists.w3.org/Archives/Public/public-xg-htmlspeech/20
             10Dec/0133.html
          7. [11]concept of session
             http://lists.w3.org/Archives/Public/public-xg-htmlspeech/20
             10Dec/0130.html
          8. [12]modify FPR30 to remove "UA"
             http://lists.w3.org/Archives/Public/public-xg-htmlspeech/20
             10Dec/0111.html
          9. [13]cancelling requests.
             http://lists.w3.org/Archives/Public/public-xg-htmlspeech/20
             10Dec/0134.html
         10. [14]discussion about API, device tag, etc
             http://lists.w3.org/Archives/Public/public-xg-htmlspeech/20
             10Dec/0142.html
      * [15]Summary of Action Items
      _________________________________________________________

    <burn> trackbot, start telcon

    <trackbot> Date: 16 December 2010

    <burn> Scribe: Robert_Brown

    <burn> ScribeNick: Robert

last week's minutes

    Dan: (no comments) last week's minutes approved

comments on the newest version of the requirements draft

    Dan: no comments

require encryption
[16]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Dec/00
99.html

      [16] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Dec/0099.html

    michael: not much mail on this, Bjorn agreed in mail, no other mail
    comments. seems reasonable

    <mbodell_> proposed req: Web application must be able to encrypt
    communications to remote speech service

    Dan: asked for objections, no objections voiced

require best practices
[17]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Dec/01
07.html

      [17] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Dec/0107.html

    Milan: not sure we're aligned on the emphasis behind this
    requirement. maybe should put it on hold. some people are
    prioritising schedule ahead of features.
    ... put it on hold and see how the other issues we discuss this week
    play out

    Bjorn: has anybody had experience where this sort of requirement is
    needed? it seems redundant

    <bringert> I got disconnected

    Dan: sometimes to prevent avoiding certain architectures

    Milan: intended to avoid the sessions/sockets issue. but lets get on
    dissing the other topics and get back to this one

require support for text interpretation
[18]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Dec/01
22.html

      [18] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Dec/0122.html

    Bjorn: i wouldn't consider it high priority, but okay keeping it for
    now

    Dan: this is certainly in scope

    Bjorn: it's already possible and doesn't need a new requirement.
    just use an xmlhttp request.

    Dan: there may be some benefit to having a unified approach

    Bjorn: agreed there's a benefit but not high priority

    Dan: looks like we have consensus on keeping it

    <mbodell_> proposed req: Web applications must be able to request NL
    interpretation based only on text input (no audio sent).

re-recognition
[19]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Dec/01
33.html

      [19] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Dec/0133.html

    Michael: a fair bit of discussion in mail, but it seems people are
    okay keeping this

    Bjorn: okay to have as a requirement, lower priority, if I was
    making the proposal I wouldn't add it because of the added
    complexity

    <mbodell_> proposed req: Web applications must be able to request
    recognition based on previously sent audio.

    Michael: no objections? [resounding silence...]

    dan: consensus

concept of session
[20]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Dec/01
30.html

      [20] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Dec/0130.html

    Michael: discussion on whether we need it and whether cookies
    support it?

    Milan: not thrilled, but okay to call this one good enough
    ... cookie gets 90% of use cases

    Bjorn: do you want to add a requirement like existing mechanisms
    should be used to manage sessions or something like that

    Milan: how about the way it's worded now?

    Bjorn: text in original email is okay with me

    Olli: okay with me too

    <Milan> Robert nervous about defintition of word session

    <burn> robert: wants to confirm meaning of "session". different from
    what we do in web apps?

    <burn> robert: is there any use case?

    <burn> bjorn: yes. could consider a speech API that does not pass on
    cookies that are set

    <burn> milan: e.g. a native agent proposal. user agent would be
    required to tack on cookies

    <burn> robert: can live with this. details will become apparent with
    the proposals

    <burn> bjorn: IETF specs use the notion of "stateful session" when
    discussing cookies

    <mbodell_> proposed req: Web application and speech services must
    have a means of binding session information to communications.

    michael: sounds like we have consensus

modify FPR30 to remove "UA"
[21]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Dec/01
11.html

      [21] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Dec/0111.html

    Bjorn: okay with Milan's restatement in mail

    Michael: concerned that this breaks our privacy requirements

    Milan: but that's broken (paraphrase)

    Michael: if I'm the only one who's nerveous I'm okay taking Milan's
    text

    Bjorn: if those mechanisms don't satisfy privacy requirements, we
    can look at improving them.

    Marc: is it part of our specification to make a position on who does
    it?

    Bjorn: xmlhttp talks about web app but implies UA requirements

    Michael: objections?

    Dan: nerveous but won't object. in prioritisation we may need to be
    more precise

    <mbodell_> proposed change: fpr30 becomes Web applications must be
    allowed at least one form of communication with a particular speech
    service that is supported in all UAs.

    <marc> my question was about confirming that at this stage we are
    not taking any decision how the communication between the web app
    and the speech service is realised, whether the UA plays a
    standardised role or not.

    Dan: agreed, move on

    <marc> confirmed that this decision is *not* taken at this stage.

    <marc> the new requirement is better because it makes this less
    explicit.

cancelling requests.
[22]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Dec/01
34.html

      [22] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Dec/0134.html

    Bjorn: besides efficiency, are there any reasons to add the
    requirement?

    Michael: existing requirements relate to this (barge-in)

    Milan: it's efficiency. but if you were going to do real barge-in in
    most of your transactions, it would be an issue

    Bjorn: if the client wants to stop sending audio, it can send a
    marker saying it's done

    Milan: that's what I'm asking for

    Bjorn: sender cancelling is easy with HTTP. receiver cancelling is
    difficult

    Milan: how would end of speech be indicated

    Bjorn: some sort of end-of-audio packet, which handles the sender
    cancelling
    ... why do we need this?

    Milan: the user agent may not be able to detect when done

    Bjorn: would server or client do that?

    Milan: the client

    Bjorn: should split into two discussions: 1 client aborting
    recognition (fine and required and trivial); 2 client aborting
    synthesis
    ... implied by FPR17

    Michael: that says the user can abort it

    Bjorn: need a separate requirement that web application should be
    able to cancel audio capture

    Marc: we used the term "abort" intentionally, with privacy concerns
    in mind

    Bjorn: duplicate FPR17, replacing user with web app

    <mbodell_> proposed new req: While capture is happening, there must
    be a way for the web application to abort the capture and
    recognition process.

    Bjorn: fine with what Michael typed
    ... [no other objections] lets move on to synthesis
    ... client wants to abort playing of long synthesized speech. if
    there's no way for the client to signal the server, the only option
    is to tear down the connection
    ... this may have latency implications to establish a new connection

    Milan: there's a lot of work that goes into establishing a TCP
    socket. Email triage is a good example. App reads a few sentences of
    a message then the user interrupts
    ... it would be awkward if the mail app just read the first sentence

    Bjorn: or the app could read a sentence at a time until it decides
    to move to the next message

    Milan: not asking for interruption (existing requirement), but to
    cancel it all the way to the server

    Bjorn: reluctant to add a requirement of going all the way to the
    server
    ... propose "web application must be able to abort TTS output"

    Milan: but Bjorn has already to do this for reco, why not TTS?

    Bjorn: reco is required, and the sender aborts by sending up a
    token. this is different, because the receiver is aborting

    Milan: but with reco, the server is sending back ack's while the
    client is speaking, so there is a bi-directional mechanism

    Bjorn: are you saying a bidirectional communication is already
    required?

    Milan: we have the requirement that speech has begun and streaming

    Bjorn: speech detection is done on the client

    Milan: nerveous about detection in the client
    ... FPR21 apps should be notified when capture starts
    ... until we have reco, we can't say that speech has begun, and we
    can't do hotword from the client

    Bjorn: notify -that- speech has begun, not -when- it has begun

    <Milan> Yep

    Milan: this is part of the problem of not having detailed
    descriptions on this. I brought this up back in the F2F meeting, but
    didn't catch the nuance of the word "that"

    Bjorn: no assumption that detection runs on the client, but also no
    exclusion of this

    Milan: but if it runs on the server, then you need bi-direction
    communication
    ... and if so, it doesn't seem to be a stretch to say we need this
    for synthesis

    Bjorn: i agree with the analysis, but probably wouldn't propose an
    API for this

    Michael: we shoudl agree on whether or not it's a requirement, then
    prioritise in the next stage

    <mbodell_> proposed req: Web application must be able to
    programatically abort tts output.

    Bjorn: can we agree that it's a requirement for the web app to abort
    TTS, without any specific requirement on how thsi affects the server

    Milan: sounds fine

    Michael: (silence) sounds like we have consensus

    Bjorn: so the other requirement is that when the client aborts TTS,
    it should not need to tear down the connection

    Marc: is this about functionality or efficiency? if it's about
    efficiency, the discussion should occur later, when we discuss
    implementation

    Milan: but it's so fundamental it would be crippling not to have
    this

    Bjorn: how about "aborting TTS should be efficient"?

    Milan: okay

    <mbodell_> proposed req: Aborting the TTS output should be
    efficient.

    Michael: sounds like we have consensus

    Bjorn: "TTS output" rather than "synthesis"
    ... one is the effect on the user experience, the other is the
    effect on efficiency

discussion about API, device tag, etc
[23]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Dec/01
42.html

      [23] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2010Dec/0142.html

    Michael: is there a set of requirements out of that discussion?

    Bjorn: no it's a proposal

    Milan: it shows a lot of promise and if we started early we could
    get done sooner

    Bjorn: there's some serious politics going on there

    Michael: WHATWG doesn't really represent all browser manufacturers

    Milan: could the audio working group handle this?

    Michael: they're more about mixing and analysis, rather than capture
    ... IE wouldn't tackle this area until it's under some w3c group

    Milan: it would be in our group's interest to get some sort of audio
    capture API into HTML

    oops, that should have been Bjorn

    Michael: UI is geared around web cam capture

    Milan: people have been working on audio capture since 2005, and we
    only started this year

    Michael: but the use cases are different

    Bjorn: is there an audio chat scenario?
    ... could we specify an API required for speech without it being
    general purpose?

    Michael: we should propose what we need and explain why we need it

    Bjorn: if we don't have a general API for app-specified network
    recognition, we can still have reco with the default recognizer

    Olli: would it be easiest to co-author it with the whatwg and then
    propose that the HTML wg pick it up

    Bjorn: that's my preference

    Marc: if the browser captured audio according to ther requirements
    for speech recognition, then we wouldn't need any specific device
    API

    Michael: an alternative is to finish discussing requirements, then
    look at proposals, for which there may be a spectrum of approaches

    Bjorn: there's no reason to exclude a particular approach at this
    point

    Milan: concerned that device API has a promise and if we don't work
    together it won't happen

    Marc: we're expected to look at the pros and cons of various options
    and maybe make a decision, or if not, at least recommend options

    Dan: people can propose more requirements later on, but we should
    move on to prioritization
    ... begin prioritization in January, but between now and then,
    review the requirements and talk about those you don't feel are
    clear enough for you to prioritize

    Michael: please send description text where you think it's missing

    Milan: would prefer that the chairs propose a description and
    participants riff on that

    Dan: prioritization is a function that will naturally work out
    issues at the next level of detail
    ... So the first thing people should do is review the requirements,
    and if you can't prioritize, start a conversation

    Michael: I will send out another update soon, and you'll have a
    couple of week to review as Dan suggests

    Milan: it'll be chaos. 50 requirements. 6 groups here

    Dan: if this turns out to not work, we'll change strategies
    ... but I think we'll probably have a very small number of threads
    ... Plan to have calls at the same timeslot in January, in case we
    need them

    Marc: Michael, could you restructure the list of requirements by
    topic?

    Michael: will move section 3 to an appendix, and can potentially
    reorder section 4. I'll make an attempt
    ... I'll see what factors out

    Great work everybody!
Received on Friday, 17 December 2010 12:25:00 UTC