Re: html speech call tomorrow - speech API work + proposal for HTML content bindings from Olli Pettay on 2011-06-30 (public-xg-htmlspeech@w3.org from June 2011)

From: Olli Pettay <Olli.Pettay@helsinki.fi>
Date: Thu, 30 Jun 2011 19:05:50 +0300
To: "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>
CC: Michael Bodell <mbodell@microsoft.com>, "Raj (Openstream) (raj@openstream.com)" <raj@openstream.com>, "Deborah Dahl (dahl@conversational-technologies.com)" <dahl@conversational-technologies.com>, "Dan Burnett (dburnett@voxeo.com)" <dburnett@voxeo.com>, "Bjorn Bringert (bringert@google.com)" <bringert@google.com>, "Charles Hemphill <charles@everspeech.com> (charles@everspeech.com)" <charles@everspeech.com>
Message-ID: <4E0C9EDE.8000701@helsinki.fi>
On 06/30/2011 01:40 PM, Olli Pettay wrote:
> On 06/30/2011 02:26 AM, Michael Bodell wrote:
>> We will be going over the various API proposals on the HTML Speech call
>> tomorrow. Kudos to Dan D for getting his submission in on time. Everyone
>> one else who had things due earlier today (the cc list of this mail)
>> should be sending them in ASAP.
>>
>> The plan for the call is to start with Raj’s work on the design
>> decisions and requirements that are relevant for the API. After that we
>> can move to the other proposals including the one from Dan and the one
>> I’m submitting below. Raj’s is the most important to get done with
>> first, which is why we are starting with that.
>>
>> Here is my proposal for the HTML bindings. For these we might extend the
>> content attributes and interface with the other proposals (for
>> specifying grammars, speech servers, events, etc.). In any case these
>> element/interfaces could be created in JS or be present in the HTML
>> document, or both.
>
>
> I don't see anything in the proposal about permissions.
> In which case is the recognizer activated?
>
> At least at the moment I'd still prefer similar mechanism what is
> proposed for microphone capture.
> In our case it could be
> Speech.getRequest(successCallback, errorCallback);
>
> That could be extended to support the "for" attribute so that a speech
> request could be associated with some part of the UI.
> Speech.getRequestFor(element, successCallback, errorCallback)
>
> Then the parameter for successCallback would be the speechrequest object
> which can be activated.



And once the capturing APIs are stable, we could hook them up to 
recognizer using Streams.
Then there was no reason, I think for getRequest(),
but there could be just
new SpeechInputRequest(stream, ...<other parameters>)



(And sorry, I'm late with my item 8)


-Olli


>
>
> -Olli
>
>>
>> ******
>>
>> The reco element
>>
>> Categories
>>
>> Flow content.
>>
>> Phrasing content.
>>
>> Interactive content.
>>
>> Form-associated element.
>>
>> Contexts in which this element can be used:
>>
>> Where phrasing content is expected.
>>
>> Content model:
>>
>> Phrasing content, but with no descendant recoable elements unless it is
>> the element's reco control, and no descendant reco elements.
>>
>> Content attributes:
>>
>> Global attributes
>>
>> form
>>
>> for
>>
>> DOM interface:
>>
>> [NamedConstructor=Reco(),
>>
>> NamedConstructor=Reco(in DOMString for)]
>>
>> interface HTMLRecoElement : HTMLElement {
>>
>> readonly attribute HTMLFormElement? form;
>>
>> attribute DOMString htmlFor;
>>
>> readonly attribute HTMLElement? control;
>>
>> };
>>
>> The reco represents a speech input in a user interface. The speech input
>> can be associated with a specific form control, known as the reco
>> element's reco control, either using for attribute, or by putting the
>> form control inside the reco element itself.
>>
>> Except where otherwise specified by the following rules, a reco element
>> has no reco control.
>>
>> The for attribute may be specified to indicate a form control with which
>> a speech input is to be associated. If the attribute is specified, the
>> attribute's value must be the ID of a recoable element in the same
>> Document as the reco element. If the attribute is specified and there is
>> an element in the Document whose ID is equal to the value of the for
>> attribute, and the first such element is a recoable element, then that
>> element is the reco element's reco control.
>>
>> If the for attribute is not specified, but the reco element has a
>> recoable element descendant, then the first such descendant in tree
>> order is the reco element's reco control.
>>
>> The reco element's exact default presentation and behavior, in
>> particular what its activation behavior might be and what implicit
>> grammars might be defined, if anything, should match the platform's reco
>> behavior. The activation behavior of a reco element for events targetted
>> at interactive content descendants of a reco element, and any
>> descendants of those interactive content descendants, must be to do
>> nothing. When a reco element with a reco control is activated and gets a
>> reco result, the default action of the recognition event should be to
>> set the value of the reco control to the top n-best interpretation of
>> the recognition (in the case of single recognition) or an appended
>> latest top n-best interpretation (in the case of dictation mode with
>> multiple inputs).
>>
>> reco . control: Returns the form control that is associated with this
>> element.
>>
>> The form attribute is used to explicitly associate the reco element with
>> its form owner.
>>
>> The htmlFor IDL attribute must reflect the for content attribute.
>>
>> The control IDL attribute must return the reco element's reco control,
>> if any, or null if there isn't one.
>>
>> control . recos: Returns a NodeList of all the reco elements that the
>> form control is associated with.
>>
>> Recoable elements have a NodeList object associated with them that
>> represents the list of reco elements, in tree order, whose reco control
>> is the element in question. The reco IDL attribute of recoable elements,
>> on getting, must return that NodeList object.
>>
>> The form IDL attribute is part of the element's forms API.
>>
>> Two constructors are provided for creating HTMLRecoElement objects (in
>> addition to the factory methods from DOM Core such as createElement()):
>> Reco() and Reco(for). When invoked as constructors, these must return a
>> new HTMLRecoElement object (a new reco element). If the for argument is
>> present, the object created must have its for content attribute set to
>> the provided value. The element's document must be the active document
>> of the browsing context of the Window object on which the interface
>> object of the invoked constructor is found.
>>
>> *********
>>
>> I’m not sure if there’s any need for a TTS element, or if that can stay
>> just JS only. If we need a TTS element it might be something like the
>> following (again, we might expand the content attributes for the other
>> aspects that the group is working on like eventhandlers, remote
>> services, etc.):
>>
>> *********
>>
>> The tts element
>>
>> Categories
>>
>> Flow content.
>>
>> Phrasing content.
>>
>> Embedded content.
>>
>> If the element has a controls attribute: Interactive content.
>>
>> Contexts in which this element can be used:
>>
>> Where embedded content is expected.
>>
>> Content model:
>>
>> If the element has a src attribute: zero or more track elements, then
>> transparent, but with no media element descendants.
>>
>> If the element does not have a src attribute: one or more source
>> elements, then zero or more track elements, then transparent, but with
>> no media element descendants.
>>
>> Content attributes:
>>
>> Global attributes
>>
>> src
>>
>> crossorigin
>>
>> preload
>>
>> autoplay
>>
>> mediagroup
>>
>> loop
>>
>> muted
>>
>> controls
>>
>> DOM interface:
>>
>> [NamedConstructor=TTS(),
>>
>> NamedConstructor=TTS(in DOMString src)]
>>
>> interface HTMLTTSElement : HTMLMediaElement {};
>>
>> A TTS element represents a synthesized audio stream.
>>
>> Content may be provided inside the TTS element. User agents should not
>> show this content to the user; it is intended for older Web browsers
>> which do not support TTS.
>>
>> In particular, this content is not intended to address accessibility
>> concerns. To make TTS content accessible to those with physical or
>> cognitive disabilities, authors are expected to provide alternative
>> media streams and/or to embed accessibility aids (such as
>> transcriptions) into their media streams.
>>
>> The TTS element is a media element whose media data is ostensibly
>> synthesized audio data.
>>
>> The src, preload, autoplay, mediagroup, loop, muted, and controls
>> attributes are the attributes common to all media elements.
>>
>> When a TTS element is potentially playing, it must have its TTS data
>> played synchronized with the current playback position, at the element's
>> effective media volume.
>>
>> When a TTS element is not potentially playing, TTS must not play for the
>> element.
>>
>> tts = new TTS( [ url ] )
>>
>> Returns a new TTS element, with the src attribute set to the value
>> passed in the argument, if applicable.
>>
>> Two constructors are provided for creating HTMLTTSElement objects (in
>> addition to the factory methods from DOM Core such as createElement()):
>> TTS() and TTS(src). When invoked as constructors, these must return a
>> new HTMLTTSElement object (a new tts element). The element must have its
>> preload attribute set to the literal value "auto". If the src argument
>> is present, the object created must have its src content attribute set
>> to the provided value, and the user agent must invoke the object's
>> resource selection algorithm before returning. The element's document
>> must be the active document of the browsing context of the Window object
>> on which the interface object of the invoked constructor is found.
>>
>
>
>
Received on Thursday, 30 June 2011 16:06:54 UTC