W3C home > Mailing lists > Public > public-xg-htmlspeech@w3.org > November 2011

[f2f] minutes 3 November 2011

From: Dan Burnett <dburnett@voxeo.com>
Date: Fri, 4 Nov 2011 14:13:28 -0400
Message-Id: <D51AEA87-CE44-42C9-B2C5-ACF57CE45A34@voxeo.com>
To: public-xg-htmlspeech@w3.org
Group,

The minutes from our first day are at http://www.w3.org/2011/11/03-htmlspeech-minutes.html.

For convenience, I have pasted a text version below.

-- dan

*****************************

             HTML Speech Incubator Group Teleconference

03 Nov 2011

   See also: [2]IRC log

      [2] http://www.w3.org/2011/11/03-htmlspeech-irc

Attendees

   Present
          DanB, Michael, Glen, Matt, Robert, Patrick, Avery, Nagesh,
          Debbie, Bertha, Milan, Rahul, DanD

   Regrets

   Chair
          Daniel_Burnett,Michael_Bodell

   Scribe
          ddahl_, ddahl

Contents

     * [3]Topics
         1. [4]Review recently sent examples
         2. [5]Robert's example
         3. [6]speech-enabled email
         4. [7]Milan's example of protocol
         5. [8]michael johnston's multimodal use case
         6. [9]Charles Hemphill's example
         7. [10]Michael Bodell's example 8, translation
         8. [11]Debbie's example
         9. [12]another example from Charles Hemphill
        10. [13]issues
        11. [14]Protocol Issues
        12. [15]Web API Issues
        13. [16]Issue 6
        14. [17]Issue 7
        15. [18]Issue 8
        16. [19]Issue 9
        17. [20]Issue 10
        18. [21]Issue 11
        19. [22]Issue 12
        20. [23]Issue 13
        21. [24]Issue 14
        22. [25]Issue 15
        23. [26]Issue 16
        24. [27]Issue 17
        25. [28]Issue 18
        26. [29]Issue 19
        27. [30]Issue 20
        28. [31]Issue 21
        29. [32]Issue 22
        30. [33]Issue 23
     * [34]Summary of Action Items
     _________________________________________________________

   <smaug> hi

   <smaug> well, who am I then o_O

   <smaug> pong

   <burn> trackbot, start telcon

   <trackbot> Date: 03 November 2011

   <Milan> ScribeNick: Milan

Review recently sent examples

   <DanD>
   [35]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct
   /att-0064/speechwepapi_1_.html#introduction

     [35] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/att-0064/speechwepapi_1_.html#introduction

   <mbodell> [36]http://bantha.org/~mbodell/speechxg/example1.html

     [36] http://bantha.org/~mbodell/speechxg/example1.html

   Michael: Speech Web Search Markup only

   Robert: Found addGrammarFrom() is awkward
   ... really a hint

   Glen: True that input has no grammar

   Michael: It's a builtin grammar

   Robert: What about derviveGrammarFrom

   Glen: It's an append grammar

   DanD: Option might be a better example

   Michael: Text is a grammar

   Robert: Assume q is an object from which a grammar can be derived

   <smaug> Nit, <button name="mic" onclick="speechClick()"> is a submit
   button, so when you click it, the form is submitted. type="button"
   would fix the problem

   DanB: addDerivedGrammar

   Debbie: Figgure out semantics first

   Robert: AddDerivedGrammarFromID

   Glen: Also rename q to 'inputField'
   ... Also from text input type to date or somethign more contrained
   ... Need to specify the lack of grammars
   ... Is this dictation?

   Robert: improve example by defaulting to UTF-8

   <glen> Section 5.1: when no grammar specified, defaults to
   builtin:dictation

   Robert: Base 64 encoding is ugly
   ... to the point where it is unsualbe

   Michael: Worried about directly inserting XML due to 8th bit

   DanB: Are there already common protocols for inserting strings
   derived from URLs into local variables?

   Glen: Should only be a W3C standard, implmentation is orthoginal

   Robert: AddFromString() would be nice:?

   Glen: addStringGrammar() and addElementGrammar()

   Avery: Perfer longer name because its truer to form

   <smaug> Couldn't you just prepend "data:application/srgs+xml," to
   the serialized XML. But anyway, using data urls is kind of hackish,
   IMO.

   Robert: Too many dots to get the interpretation

   Milan: Propose addGramamrFromURI()

   Robert: Newing up a speech grammar is better approach

   Michael: Let's just raise issues now rather than solve them

   Debbie: Example is complex, and gets mixed up with arguement that JS
   is complex

   * laptop?

   Michael: Next example from Bjorn

   Robert: The example lacks a grammar

   <smaug> s/onclick="startSpeech"/onclick="startSpeech(event)"/

   <DanD>
   [37]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov
   /att-0008/web-speech-sample-code.html

     [37] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov/att-0008/web-speech-sample-code.html

   Robert: Need to define what happens when lacking a grammar

   Avery: Is there a policy against comments in the examples?

   Michael: Planning on adding examples to an appendix

   Avery: It's a decent example, as long as it is clear that this
   instance lacks a grammar

   Robert: Example shows default behavior

   Rahul: Could also delete button as means of shorting example

   <glen> per Avery's suggestion: add a comment "since no grammar is
   specified and no element is binded, uses default grammar
   builtin:dictation"

   Rahul: Two different ways to perform same array access

   Glen: Should make it consistent in example

   <mbodell> In Bjorn's second example need sir.maxNBest = 2;

   <glen> use same notation: s/q.value =
   event.result.item(0).interpretation;/q.value =
   event.result[$1\47].interpretation;/

   Robert: Intent is to get a text transcript of the user's input
   ... why are we accessing the interpretation instead of tokens?

   Milan: Need to bring this up in protocol team

   <all agreed> to replace to "utterance" in place of interpretation

   Milan: Last two comments should apply here as well
   ... Should we have company-specific references?

   Michael: Prefer example.org

   Robert: Is there speech recognition in turn by turn>

   Michael: Speech recognition is just destination capture

   <smaug> Again, s/onclick="startSpeech"/onclick="startSpeech(event)"/

   Robert: The prefer speek next instruction should cancel last
   instruction

   Glen: Thought the purpose of example was to show interplay between
   speech and tts?

   Michael: TTS play resumes where last left off

   Glen: Way to stop prior play is a good feature
   ... we should change this example

   <glen> change example to show how to stop, by persisting the tts
   object and calling stop before adding .text and .play

   Michael: Ollie example next

   <mbodell>
   [38]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov
   /att-0009/htmlspeech_permission_example.html

     [38] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov/att-0009/htmlspeech_permission_example.html

   Micahel: First example is just removing unauthorized elements?
   ... but second example doesn't allow speech input to start

   Ollie: Yes

   Michael: Can you transition from not authorized to authorized?

   Ollie: Should be possible, but example doesn't do that
   ... but could also just reload the page

   * Going on break now

   <inserted> scribe:ddahl_

   <scribe> scribe:ddahl

Robert's example

   robert: two recognitions in a row, you want to pick your cities
   based on what state you're in.

   <Avery> Actually I think it's based on what state is specified in
   the first reco, not necessarily what state you're in. A minor nit.

   robert: it really should say "interpretation.state", not just
   "interpretation"
   ... used push instead of adding things to the array of speech
   grammars
   ... a bug on result, should be city, also, sr.onMatch should be
   sr.onResult
   ... second example is rereco
   ... gives grammars to speechInputRequest, then classifies, then does
   rereco with a specific grammar

   glenn: this seems to be a strange use of "interpretation"

   robert: there is a huge universe of grammars

   rahul: this is identifying one grammar as different from the others

   robert: using the attribute "modal" to activate and deactivate
   grammars
   ... would change the example to get interpretation.classification
   ... strange to have multiple "modals" as true, think modal might be
   a bad idea

speech-enabled email

   michael: one interesting thing is that you might get notifications
   that you would want to speak to, but without clicking

   robert: was mostly thinking about things like "reply", but you could
   also imagine saying "read it to me" after notification
   ... made up a method to cancel TTS

   michael: you could just delete the element

   robert: what if you set up the element with stuff in it?

   glenn: destroy should not be to only way to cancel

Milan's example of protocol

   milan: will augment with API calls that trigger protocols
   ... need a result index of some kind
   ... then recognizer decides to change its mind and reorders results
   ... strange to get a "complete" result in the middle of a long
   dictation
   ... result index 0 is the first fragment, then halfway through the
   second fragment, the recognizer says the first one is done
   ... different from MRCP, because in MRCP that means it's the end of
   it
   ... then retracts a result, not sure how to represent this, maybe an
   "IN_PRO
   ... GRESS" message with no payload
   ... we will put this in the larger document as an example of the
   protocol

michael johnston's multimodal use case

   <smaug> Could you please paste links to the example here

   michael: "I want to go from here to there" is the use case

   <smaug> ( would be then easier to read minutes later )

   <mbodell> Michael's example:
   [39]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov
   /att-0020/multimodal_example.html

     [39] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov/att-0020/multimodal_example.html

   <mbodell> You can walk through the examples from:
   [40]http://bantha.org/~mbodell/speechxg/f2f.html which links to
   [41]http://bantha.org/~mbodell/speechxg/examples.html which then
   walks through the examples

     [40] http://bantha.org/~mbodell/speechxg/f2f.html
     [41] http://bantha.org/~mbodell/speechxg/examples.html

   glenn: it would be good to have a "state" attribute
   ... the "nomatch" state is more of a result, not a state
   ... we may need more than one attribute to get results of speech
   processing

   michael: this also has the EMMA so that you can see the mapping from
   EMMA
   ... this example makes use of a remote speech service

   <glen>
   [42]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov
   /att-0020/multimodal_example.html

     [42] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov/att-0020/multimodal_example.html

   michael: the EMMA shows the combined speech and gui input

   robert: this should be a wss: , that is, a web socket protocol, but
   what should we do if someone uses http?

   michael: you could get the command right but not the person if you
   didn't do the "clickInfo"

Charles Hemphill's example

   <glen>
   [43]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov
   /0024.html

     [43] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov/0024.html

   danD: we should start with the simplest example

Michael Bodell's example 8, translation

   <glen> [44]http://bantha.org/~mbodell/translate.html

     [44] http://bantha.org/~mbodell/translate.html

   <glen> view-source:[45]http://bantha.org/~mbodell/translate.html

     [45] http://bantha.org/~mbodell/translate.html

   michael: different example of translation
   ... there's from and to languages, you choose, and then click on
   microphone to talk
   ... there's a progress bar that get's updated
   ... we're grabbing our language from the selector, we're using a
   dictation grammar for whatever language we're using
   ... where are we doing capture?

   glen: wouldn't that be the microphone?

   michael: not necessarily, there could be other things like media
   streams

   glen: is capture necessary or does it just provide more features?

   michael: we didn't have any examples of capture from other places,
   like from Web RTC
   ... right now there's no standard for accessing microphone

   glen: would like to see default example where we don't have to
   explicitly do capture

   michael: all examples assume that there's magic for capturing audio

   glen: can't we make it so that the magic is what happens by default?

   dan: there are many security and privacy issues
   ... different permissions for getting access to media but also to do
   something to the media

   michael: this is also raised in some of our issues, we only have a
   two sentence note now
   ... can TTS work on Web Sockets?

   robert: yes

   michael: on audio start, etc. are in our spec. another issue is that
   payload of start, stop events isn't defined

   robert: : do we have VU meter events?

   michael: no

   dan: that came up in Web RTC, they don't have that, but they could
   create it

   michael: we do have speech-x events for custom extensions

   robert: most speech apps have one

   michael: is that part of the UA or the app?

Debbie's example

   multi-slot filling

   <mbodell> Debbie's:
   [46]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov
   /att-0031/Multi-slotSpeech1.html

     [46] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov/att-0031/Multi-slotSpeech1.html

   debbie: in this example you have to pull out the slot values from
   the EMMA

   robert: is this the same as saying "interpretation.booking"?

   debbie: not sure
   ... we don't know what's in "interpretation"

   robert: we could get rid of "interpretation"

   michael: it could be a useful pointer into the EMMA
   ... that is available in VXML

   <mbodell> Issue: we should make sure it is clear what the
   interpretation points to

   <trackbot> Created ISSUE-1 - We should make sure it is clear what
   the interpretation points to ; please complete additional details at
   [47]http://www.w3.org/2005/Incubator/htmlspeech/track/issues/1/edit
   .

     [47] http://www.w3.org/2005/Incubator/htmlspeech/track/issues/1/edit

   michael: should do an if to make sure that you really got a value

   debbie: could add the EMMA
   ... would there be value in some kind of convenience syntax so that
   you don't need the full DOM generality to manipulate the EMMA
   result?

   <mbodell> Charles' example:
   [48]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov
   /0033.html

     [48] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov/0033.html

another example from Charles Hemphill

   michael: the same example as before but with an external grammar

   avery: what's the advantage of having "reco" element as a child
   under "input"

   michael: there are two different ways to do the same thing, with
   "reco" under as a "child" under <input> you don't need an id

   <smaug> <input> element can't have child elements

   actually, input is a child of reco in the proposal

   <smaug> My comments to example 3
   [49]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov
   /0034.html

     [49] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Nov/0034.html

   michael: another example with a real inline grammar so that you
   don't have to do data uri
   ... we would have to define a "grammar" tag

   robert: we would have to define for browsers how to interpret SRGS

   avery: like putting script in page vs an external reference

   <smaug> Milan: remember, we're talking about HTML here, not XML

   <smaug> (I assume that was Milan)

   milan: could we say "as long as this is valid XML ignore it and pass
   it to us"?

   robert: why wrap the whole thing with the grammar element?

   michael: if there's an SRGS 1.1, you wouldn't know what version it
   was, for example
   ... would like to have inline grammar, if any, be full SRGS with
   <grammar> element
   ... that is the end of the examples

   <Milan> * Good point Ollie

   <glen> scribenick: glen

issues

   burnett: if can't agree, depends on importance. If important,
   capture different opinions in doc.
   ... (not required to resolve everything in incubator group)

   <mbodell> First issue to discuss:
   [50]http://bantha.org/~mbodell/speechxg/issuep1.html

     [50] http://bantha.org/~mbodell/speechxg/issuep1.html

   1. What Content-Type do we want to use on an empty message? Use case
   was nulling out previous candidate recognition.

   milan: do we have to specify? can it be assumed?
   ... empty means no payload?

   robert: protocol doesn't require a body
   ... in which case I don't think it needs a content type. Example
   getParams

   michael: what about interim results?
   ... if no content and not content type then nulls out corresponding
   result. Example: an interim result gets replaced with no result
   (e.g. if a <cough> is initially recognized as some text)

Protocol Issues

   2. I am skeptical about changing established MRCP event/method
   names. I sort of agree that LISTEN is better than RECOGNIZE, but do
   not think the reasons are good enough to warrant ensuing churn.

   Robert: Microsoft doesn't care if similar to MRCP, rather that it's
   compatible with our web sockets protocol

   burnett: web sockets is just a transport
   ... violates many types of protocol design
   ... if standards track, IETF is a logical place

   robert: so naming doesn't matter much at this point.

   all: agree

   burnett: some talk of using SIP to setup, would have to separate
   signaling and data...which is one thing wrong with this.

   robert: this is more to illustrate a point that it can be done

   burnett: companies could implement today, and may not be completely
   interoperable (as is often the case on first implementations)

   michael: we agree, not to change names right now. Names will likely
   to be re-evaluated in a standards track.
   ... minor syntax issues can be called out as a note in the doc.

   burnett: when gets into a standards group, they look at requirements
   and take ideas into consideration, but they consider MANY other
   factors, e.g. security, that drive

   3. We need a way to index the recognition results. I suggest using a
   Result-Index header

   all: agree to add. if a one-shot recognition, it's only [$1\47] and
   still optional

   4. It was awkward to use a RECOGNITION-COMPLETE message presumably
   with a COMPLETE status during continuous speech. Instead, I used
   INTERMEDIATE-RESULT with a new Result-Status header set to final.

   robert: just rename RECOGNITION-COMPLETE as RECOGNITION-RESULT
   ... it's an intermediate, unless it's a final response type.

   burnett: MRCP has separate status code and completion code

   Milan: we need a complete flag, not sure it was defined. We haven't
   stated which status codes correspond to which messages.

   burnett: in MRCP, status is about communication (like 200 OK). In
   MRCP, the completion code indicates what happened (e.g. successful
   reco)

   robert: so status indicates "sending more", so status should be
   in-progress for continuous reco case.
   ... need request state?

   burnett: request has been made, has it been completed yet? status is
   success, illegal method, illegal value, unsupported header

   robert: reco result, 200 OK, in progress

   5. Perhaps Source-Time should also be required on final results

   all: yes, everything's fine, more to come

   Milan: by time have final result, should know start time.

   all: agree, require only reco result

   Milan: could be reco result with type = pending

   michael: pending implies have already started

   robert: in progress more accurate

   all: agree to leave as is

   6. Wanted to confirm that channel identification is being handled by
   the WebSocket container

   robert: handled by web socket
   ... if two separate recos, then two web sockets and two audio
   streams. (Can have 2 grammars active in one reco)

   milan: continuous hotword case

   robert: that's continuous reco
   ... start session with hotword and command-control grammar, all is
   continuous results

   michael: hard if change over time
   ... because have to pause to change
   ... so not continuous

   robert: don't want to transmit audio twice, but with two sessions,
   you must

   avery: does emma result specifies which grammar?

   michael: yes

   7. I noticed that Completion-Cause was missing from Robert's spec
   example in section 4.2.

   robert: accidental omission, need to add

Web API Issues

   1. To get the reco result I think i have to write
   "e.result.item(0).interpretation". This is a lot of dots and an
   index just to get the top result.

   robert: I want to write e.interpretation -- because most of the time
   that's what I want (but still could use the verbose way as well)

   <mbodell> Here is the link to where the event is defined:
   [51]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct
   /att-0064/speechwepapi_1_.html#speechinputresult

     [51] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/att-0064/speechwepapi_1_.html#speechinputresult

   milan: e.result.interpretation

   michael: can already use e.result[$1\47].interpretation

   glen: we should change utterance to match

   all: e.interpretation and e.utterance
   ... agreed

   2. "utterance" has a couple of different meanings in the doc. It's
   alternatively the recording of what the person said, or the
   transcript returned by the recognizer.

   michael: transcript? text? tokens?

   text and token are over used and confusing

   robert: but it is text, so not overloading the concept
   ... (unlike token)

   burnett: transcript, closest to what's actually happening, laymen
   get it

   glen: text is not descriptive: interpretation is text, whereas
   transcript vs interpretation is clear

   all: agree: rename utterance to transcript

   5. The "modal" attribute on SpeechGrammar is unnecessarily
   restrictive

   Discussion: There are cases where I'll want to have multiple
   grammars active, but not all, and not just one. Developers would be
   better off with a boolean enabled attribute on each grammar. Would
   be useful to clarify the behavior when there is more than 1 grammar
   with this set to true (only the first in the list is active?) Is
   this even useful at all? What is the case for having grammars which
   aren't active in the reco? Can we change the state of the modal/u

   robert: less lines of code if just set one to true

   milan: alternatively, could add/remove from grammars array

   glen: sending all at once allows caching
   ... of grammars
   ... what about continuous case, can grammars change on the fly

   michael: we decided to simplify by re-calling .start to change
   grammars or anything else

   milan: should have a separate way to preload

   burnett: voicexml has defineGrammar

   milan: grammar set object on the SpeechInputRequest
   ... I proposing sets of grammars

   robert: I'd like it flatter, get rid of enabled/disabled -- just
   delete -- and don't allow preload

   michael: already have .open that allows preloading

   <mbodell> See web api:
   [52]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct
   /att-0064/speechwepapi_1_.html#dfn-open

     [52] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/att-0064/speechwepapi_1_.html#dfn-open

   <scribe> scribenick: glen

   burnett: nervous about this, we discussed this for a long time and
   considered many edge-cases

   robert: alternative: get rid of modal and enable, and just use a
   bunch of grammars

   avery: if .open has already been called, .start doesn't call it
   (.start only calls .open if hasn't been opened yet)

   burnett: wondering if there are performance advantages in reco
   engine if can call enable/disable as opposed to calling .open
   multiple times?

   <smaug> start might need to re-call open if authorizationState has
   changed from not-authorized to authorized

   milan: MRCP didn't solve this, why should we?
   ... all good MRCP clients do what you're saying automatically, they
   automatically check for the deltas

   robert: this runs at web-scale, distributed
   ... big difference between telephony and web

   michael: options: eliminate model, keep and define what happens if
   multiple set to true

   avery: easier to add later than remove

   agree: eliminate .modal

   5. The interpretation attribute is completely opaque. That may be
   necessary given that SISR can pretty much return anything. But it'll
   need some examples to show how to use it.

   burnett: there was support for a flat array of interpretations
   ... I didn't like that, Nuance and their customers didn't like it,

   debbie: use emma to define layout

   michael: different reco engines may use emma in different ways
   ... fundamentally, .interpretation points to somewhere in emma,
   which simplifies (and a corresponding .transcript)

   all: agree, specify which part of emma holds the interpretation

   michael: mapped to a DOM object, emma literal or node
   ... like debbie's slot-filling example
   ... I will send text for this

   5. The array of SpeechGrammar objects is too cumbersome

   <smaug> something happened to the audio. it is all just noise

   <smaug> though, getting late here

   Discussion: Robert: The array of SpeechGrammar objects is too
   cumbersome. In most cases I'd like to write something simple like:
   mySR.speechGrammars.push("USstates.grxml","majorUScities.grxml","maj
   orInternationalCities.grxml"); But I can't. I have new-up a separate
   object for each one then add it to the array, even when I don't care
   about the other attributes. Better to just make it an array of URI
   strings, and add functions for the edge cases. e.g. void ena

   void setWeight(in DOMString grammarUri, in float weight); And yeah,
   I remember arguing the opposite on the phone call. But that's before
   I tried writing sample code. Glen: "The uri of a grammar associated
   with this reco. If unset, this defaults to the default builtin uri."
   Presumably using the grammar attribute overwrites the default
   grammar, so if a developer wishes to add a grammar that supplements
   the default grammar, then this alternative should work: re

   would add clarity. Michael: If you view source on the web api
   document you'll see the grammar functions and descriptions are there
   commented out as I anticipated, and agree, with this comment. We
   should have both functions and array/collections and this makes the
   things that Robert and Glen describe much easier/better.

   michael: grammar spec after ? are hints, before builtin: are
   required and errors if not supported
   ...example: builtin:contacts may recognize names in smartphone
   ... require built:generic

   burnett: built:generic means I'll take anything you got: if it's
   just a date grammar, I'll take it.

   <mbodell> We are talking about
   [53]http://bantha.org/~mbodell/speechxg/issuew5.html but really more
   about what happens with no grammar

     [53] http://bantha.org/~mbodell/speechxg/issuew5.html

   milan: builtin:generic could respond with failure, builtin:dictation
   could also respond with failure

   robert: builtin:generic should be builtin:default
   ... and none specified is builtin:default

   burnett: what if want to use both default and another grammar

   glen: then add builtin:default and builtin:foo

   michael: default is not user default, but service or ua default

   milan: want a way to record without a grammar

   michael: we define builtin:default, encourage vendors to implement,
   and state when none specified, it's on by default. (and when other
   grammars specifed, it can also be added.
   ... I like .addGrammer(url, weight) as a simplification from
   creating object and then setting it

   robert: .addGrammarFromUrl(url, weight)
   ... .addGrammarFromElement(element, weight)
   .addGrammarFromString(string, weight)
   ... better yet: .addUrlGrammar .addElementGrammar .addStringGrammar
   ... but advantage for objects to be alphabetical order, grouped
   together in docs

   glen: .addGrammarUrl .addGrammarElement .addGrammarString
   ... remove is a JavaScript array operation

   michael: also .addCustomParameter(name, value)

   all: agree: .addGrammarUrl .addGrammarElement .addGrammarString
   .addCustomParameter

   <smaug> I think this is enough for me. I'll read the minutes
   tomorrow and send comments

   <smaug> It is midnight here

   <smaug> dark? it has been dark hear for the last 6 hours

   <smaug> here

   <rahul> scribenick: rahul

Issue 6

   <mbodell> Link to the current issue:
   [54]http://bantha.org/~mbodell/speechxg/issuew6.html

     [54] http://bantha.org/~mbodell/speechxg/issuew6.html

   <glen> 6. The names are a bit long.

   <glen> Discussion: e.g. "new SpeechInputRequest()" vs "new
   SpeechIn()" . e.g. "mySR.speechGrammars.push("foo")" vs
   "mySR.grammars.push("foo")" . e.g. "resultEMMAXML" vs "EMMAXML" or
   just "EMMA" (call the other one "EMMAText" ) e.g. "inputWaveformURI"
   vs "inputURI"

   Milan: how about SpeechRequest instead of SpeechInputRequest?

   Robert: SpeechRecognizer?

   Milan: AudioSynthesizer?

   Glen: SpeechReco?

   <Milan> Milan: AudioSynth

   <Milan> * test

   Resolution: We will use SpeechReco instead of SpeechInputRequest

   <matt> [55]Parkinson's Law of Triviality

     [55] http://en.wikipedia.org/wiki/Bikeshedding

   <scribe> ACTION: Editing team to update to SpeechReco [recorded in
   [56]http://www.w3.org/2011/11/03-htmlspeech-minutes.html#action01]

   <trackbot> Sorry, couldn't find user - Editing

Issue 7

   7. SpeechInputRequest.outputToElement() should be an attribute,
   perhaps 'forElement'

   <matt> [57]Issue 7

     [57] http://bantha.org/~mbodell/speechxg/issuew7.html

   Resolution: Replace outputToElement() function with the
   outputElement attribute

Issue 8

   <inserted> [58]Issue 8

     [58] http://bantha.org/~mbodell/speechxg/issuew8.html

   8. SpeechInputResult has a getter "item(index)".
   SpeechInputResultEvent has an array "SpeechInputResult[] results.

   Discussion: Can we change both to be collections similar to├
   [59]http://www.w3.org/TR/FileAPI/#dfn-filelist├ (accessible via []
   operator and optionally with a .item() method)?

     [59] http://www.w3.org/TR/FileAPI/#dfn-filelist

   Resolution: Accepted

Issue 9

   <matt> [60]Issue 9

     [60] http://bantha.org/~mbodell/speechxg/issuew9.html

   9. The <reco> element should probably be a void element with no
   content on its own

   Discussion: Satish:├
   [61]http://dev.w3.org/html5/spec/Overview.html#void-elements. I just
   noticed this in the for attribute's description, missed it in
   earlier reads: "If the for attribute is not specified, but the reco
   element has a recoable element descendant, then the first such
   descendant in tree order is the reco element's reco control." Is
   there a benefit to doing this over requiring the 'for' attribute to
   be set and making reco a void element? Charles: I a

     [61] http://dev.w3.org/html5/spec/Overview.html#void-elements.

   <glen> resolution: can specify with either descendent or with for=
   attribute

   Resolution: Agreed to leave it as-is using either the for or the
   descendant pattern

Issue 10

   <matt>

   <inserted> [62]Issue 10

     [62] http://bantha.org/~mbodell/speechxg/issuew10.html

   10. TTS is hard

   <matt> |[63]http://bantha.org/~mbodell/speechxg/issuew8.html||

     [63] http://bantha.org/~mbodell/speechxg/issuew8.html

   Discussion: Bjorn: I can't see any easy way to do programmatic TTS.
   The TTS element is at least missing the attributes @text and @lang.
   Without those, it's pretty hard to do the very simple use case of
   generating a string and speaking it. It's possible, but you need to
   build a whole SSML document. For use cases, see the samples I sent
   earlier today. Dominic: For TTS, I don't understand where the
   content to be spoken is supposed to go if it's not specified in

   Michael: @lang is not missing since it could be inherited
   ... there is no @text there

   Glen: content within <tts></tts> will show up within older browsers

   <mbodell> Discussion is <tts src="data:text/plain,Hello, world"/>
   versus <tts value="Hello, world"/> versus something else. Note in JS
   we could define a function so it is pretty similar, but from Markup
   a little harder to get the function creating the data uri (probably
   still possible)

   <glen> 72<tts value="fahrenheit">F</tts>

   <glen> michael: tts as a markup may render visually a control (play,
   stop, etc)

   <glen> ...other dom can interact

   <glen> glen: most uses of tts need dynamic control -- that is
   require javascript

   <glen> michael: because tts inherits from media-element, it requires
   a src attribute

   <glen> glen: <img alt="text">

   <glen> michael: <tts> is not used as an alternative fallback

   Dan: usecase for <tts> element is to facilitate easy generation as
   part of markup rather than generating script

   s/|[64]http://bantha.org/~mbodell/speechxg/issuew8.html||//

     [64] http://bantha.org/~mbodell/speechxg/issuew8.html

   Michael: the @lang inherited from the <media> element should be
   passed as a parameter to the synthesizer

   Resolution: Add a @text attribute to <tts>.

Issue 11

   -> [65]http://bantha.org/~mbodell/speechxg/issuew11.html Issue 11

     [65] http://bantha.org/~mbodell/speechxg/issuew11.html

   11. How does binding to button work

   Discussion: Satish: "When the recoable element is a├ button├ then if
   the button is not├ disabled, then the result of a speech recognition
   is to activate the button." "For button controls (submit, image,
   reset, button) the act of recognition just activates the input."
   "For type checkbox, the input should be set to a checkedness of
   true. For type radiobutton, the input should be set to a checkedness
   of true, and all other inputs in the radio button group must b

   Michael: propose to have an issue note that this needs further
   thought

   Robert: define what we can, and for others say there is no binding

   Resolution: Add issue note that more work to be done on bindings

Issue 12

   12. What about meter, progress, and output elements?

   Discussion: Satish: The meter, progress and output elements all seem
   to be aimed at displaying results and not for taking user input. Is
   there a reason why these are included as recoable elements?Michael:
   This is specified at├ Reco Bindings. A person could want to be able
   to speak and have it change a progress bar or meter or output
   element. The primary reason is matching what is done with label.
   These are all labelable elements and thus ended up as recoable

   Glen: suggest we not talk about bindings to these

   Dan: we need to decide which ones to leave out, I agree since these
   are not even <input> elements

   Resolution: Remove these from the recoable elements and bindings

Issue 13

   13. grammars and parameters should be collections

   Discussion: Satish: Similar to├ issue 8, SpeechInputRequest
   attributes 'grammars' and 'parameters' should probably be turned
   into a collection as well

   Resolution: Accepted

Issue 14

   14. rename language to lang

   Discussion: SpeechInputRequest.language should probably be changed
   to 'lang' to match├ lang attributes.

   Resolution: Accepted

Issue 15

   15. rename iterimResults to interimResultsInterval

   Discussion: SpeechInputResult.interimResults should probably be
   renamed to interimResultsInterval to indicate its usage similar to
   how other attributes have 'Timeout' in their names

   Resolution: Turn into boolean property, name does not change

Issue 16

   16. drop enum prefixes

   Discussion: SPEECH_AUTHORIZATION_ prefix could be dropped for the
   enums and just have 'UNKNOWN', 'AUTHORIZED' & 'NOT_AUTHORIZED'
   (similar to├ XHR States). Same for SPEECH_INPUT_ERR_* and other such
   enums.

   Resolution: Accepted (given Satish's input and expertise)

Issue 17

   17. A way to uncheck automatically by speech?

   Discussion: Glen: "For type checkbox, the input should be set to a
   checkedness of true." It would be nice to have a way to allow user
   to say something to set it to false, but I can't think of a good
   convention for this other than adding an attribute or grammar.
   Perhaps this could/should only be possible via scripting. (I don't
   like the idea of toggling the checkbox because some users may not be
   able to easily observe what state the checkbox is currently in.)

   Resolution: See resolution to issue 11

   <smaug> mbodell: I'm kind of online

   <smaug> what enum conflicts?

   <smaug> if the const is in an interface, then no

Issue 18

   <inserted> [66]Issue 18

     [66] http://bantha.org/~mbodell/speechxg/issuew18.html

   18. Binding hints versus requirements

   Discussion: Glen: "For date and time types ... type of color ...
   type of range the assignment is only allowed if it is a valid ..."
   On our call we discussed how these grammars are hints, and in
   particular how pattern may be difficult to implement. We discussed
   that showing an output response, even an invalid one, may be more
   valuable than no response. Michael: We can do hints for patterns on
   text, and for numbers out of range, but for other types HTML5 is jus

   Resolution: See resolution to issue 11

   <glen> satish provides this example of two sets of enums, with no
   prefixes.

   <glen> [67]https://developer.mozilla.org/en/DOM/HTMLMediaElement

     [67] https://developer.mozilla.org/en/DOM/HTMLMediaElement

Issue 19

   19. Does reco and TTS need to be on a server as opposed to client
   side?

   Discussion: Dominic: The spec for both reco and TTS now allow the
   user to specify a service URL. Could you clarify what the value
   would be if the developer wishes to use a local (client-side)
   engine, if available? Some of the spec seems to assume a network
   speech implementation, but client-side reco and TTS are very much
   possible and quite desirable for applications that require extremely
   low latency, like accessibility in particular. Is there any
   possibility

   <matt> [68]Issue 19

     [68] http://bantha.org/~mbodell/speechxg/issuew19.html

   <glen> satish continues: HTMLMediaElement.LOADED so no clashes

   <glen> (above refers to issue 16)

   Resolution: The service does not need to be remote, UAs may define
   URIs to local engines. We should add clarifying text specifying
   this. Also, the serviceURI does not need to be remote. We will
   clarify this as well.

Issue 20

   20. Set lastMark?

   Discussion: Dominic: An earlier draft had the ability to set
   lastMark, but now it looks like it's read-only, is that correct?
   That actually may be easier to implement, because many speech
   engines don't support seeking to the middle of a speech stream
   without first synthesizing the whole thing. Michael: Actually the
   speech xg version has never supported setting a lastMark. You can
   control playback using the normal├ media├ controls (setting
   currentTime, seekabl

   Resolution: Leave as-is right now. Add issue note about also making
   it writeable.

Issue 21

   21. More frequent callbacks?

   Discussion: Dominic: When I posted the initial version of the TTS
   extension API on the chromium-extensions list, the primary feature
   request I got from developers was the ability to get sentence, word,
   and even phoneme-level callbacks, so that got added to the API
   before we launched it. Having callbacks at ssml markers is great,
   but many applications require synchronizing closely with the speech,
   and it seems really cumbersome and wasteful to have to add an s

   Resolution: Leave as-is. Suggest as enhancement to SSML.

Issue 22

   22. How do we fit with capture/input/MediaStream?

   Discussion: Michael: Our spec has:├ attribute MediaStream input;├
   but we have nearly no explanation of it and our examples don't show
   how to use it. Can we do better?

   <mbodell> spec link:
   [69]http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct
   /att-0064/speechwepapi_1_.html#dfn-input

     [69] http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Oct/att-0064/speechwepapi_1_.html#dfn-input

   Resolution: This XG probably can't do better. We should have an
   issue note and include the assumption that media stream input
   somehow happens.This seems to be of interest to numerous groups
   (Audio, DAP, Web RTC, HTML Speech XG ...), Debbie will follow up as
   part of the HCG.

   <scribe> ACTION: ddahl2 to set up follow-up via HCG [recorded in
   [70]http://www.w3.org/2011/11/03-htmlspeech-minutes.html#action02]

   <trackbot> Created ACTION-4 - Set up follow-up via HCG [on Deborah
   Dahl - due 2011-11-11].

Issue 23

   23. How does speechend and related events do timing?

   <matt> [71]Issue 23

     [71] http://bantha.org/~mbodell/speechxg/issuew23.html

   Discussion: Michael: Our spec is missing explanations around the
   timing and how the information is reflected.

   <scribe> Meeting: HTML Speech Incubator Group - 2011 TPAC F2F, Day 1

   Resolution: define the data to reflect the source-time back into the
   events. Do it on all events that accept time (including result and
   speech-x). Note this timing is always relative to the "stream-time"
   and real time may be faster or slower than that.

   <kaz> [ Thursday meeting adjourned ]
Received on Friday, 4 November 2011 18:31:39 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:16:51 UTC