Re: html speech call tomorrow - speech API work + proposal for HTML content bindings from Olli Pettay on 2011-06-30 (public-xg-htmlspeech@w3.org from June 2011)

From: Olli Pettay <Olli.Pettay@helsinki.fi>
Date: Thu, 30 Jun 2011 13:40:49 +0300
To: Michael Bodell <mbodell@microsoft.com>
CC: "public-xg-htmlspeech@w3.org" <public-xg-htmlspeech@w3.org>, "Raj (Openstream) (raj@openstream.com)" <raj@openstream.com>, "Deborah Dahl (dahl@conversational-technologies.com)" <dahl@conversational-technologies.com>, "Dan Burnett (dburnett@voxeo.com)" <dburnett@voxeo.com>, "Bjorn Bringert (bringert@google.com)" <bringert@google.com>, "Charles Hemphill <charles@everspeech.com> (charles@everspeech.com)" <charles@everspeech.com>
Message-ID: <4E0C52B1.4040509@helsinki.fi>
On 06/30/2011 02:26 AM, Michael Bodell wrote:
> We will be going over the various API proposals on the HTML Speech call
> tomorrow. Kudos to Dan D for getting his submission in on time. Everyone
> one else who had things due earlier today (the cc list of this mail)
> should be sending them in ASAP.
>
> The plan for the call is to start with Raj’s work on the design
> decisions and requirements that are relevant for the API. After that we
> can move to the other proposals including the one from Dan and the one
> I’m submitting below. Raj’s is the most important to get done with
> first, which is why we are starting with that.
>
> Here is my proposal for the HTML bindings. For these we might extend the
> content attributes and interface with the other proposals (for
> specifying grammars, speech servers, events, etc.). In any case these
> element/interfaces could be created in JS or be present in the HTML
> document, or both.


I don't see anything in the proposal about permissions.
In which case is the recognizer activated?

At least at the moment I'd still prefer similar mechanism what is
proposed for microphone capture.
In our case it could be
Speech.getRequest(successCallback, errorCallback);

That could be extended to support the "for" attribute so that a speech 
request could be associated with some part of the UI.
Speech.getRequestFor(element, successCallback, errorCallback)

Then the parameter for successCallback would be the speechrequest object 
which can be activated.


-Olli

>
> ******
>
> The reco element
>
> Categories
>
> Flow content.
>
> Phrasing content.
>
> Interactive content.
>
> Form-associated element.
>
> Contexts in which this element can be used:
>
> Where phrasing content is expected.
>
> Content model:
>
> Phrasing content, but with no descendant recoable elements unless it is
> the element's reco control, and no descendant reco elements.
>
> Content attributes:
>
> Global attributes
>
> form
>
> for
>
> DOM interface:
>
> [NamedConstructor=Reco(),
>
> NamedConstructor=Reco(in DOMString for)]
>
> interface HTMLRecoElement : HTMLElement {
>
> readonly attribute HTMLFormElement? form;
>
> attribute DOMString htmlFor;
>
> readonly attribute HTMLElement? control;
>
> };
>
> The reco represents a speech input in a user interface. The speech input
> can be associated with a specific form control, known as the reco
> element's reco control, either using for attribute, or by putting the
> form control inside the reco element itself.
>
> Except where otherwise specified by the following rules, a reco element
> has no reco control.
>
> The for attribute may be specified to indicate a form control with which
> a speech input is to be associated. If the attribute is specified, the
> attribute's value must be the ID of a recoable element in the same
> Document as the reco element. If the attribute is specified and there is
> an element in the Document whose ID is equal to the value of the for
> attribute, and the first such element is a recoable element, then that
> element is the reco element's reco control.
>
> If the for attribute is not specified, but the reco element has a
> recoable element descendant, then the first such descendant in tree
> order is the reco element's reco control.
>
> The reco element's exact default presentation and behavior, in
> particular what its activation behavior might be and what implicit
> grammars might be defined, if anything, should match the platform's reco
> behavior. The activation behavior of a reco element for events targetted
> at interactive content descendants of a reco element, and any
> descendants of those interactive content descendants, must be to do
> nothing. When a reco element with a reco control is activated and gets a
> reco result, the default action of the recognition event should be to
> set the value of the reco control to the top n-best interpretation of
> the recognition (in the case of single recognition) or an appended
> latest top n-best interpretation (in the case of dictation mode with
> multiple inputs).
>
> reco . control: Returns the form control that is associated with this
> element.
>
> The form attribute is used to explicitly associate the reco element with
> its form owner.
>
> The htmlFor IDL attribute must reflect the for content attribute.
>
> The control IDL attribute must return the reco element's reco control,
> if any, or null if there isn't one.
>
> control . recos: Returns a NodeList of all the reco elements that the
> form control is associated with.
>
> Recoable elements have a NodeList object associated with them that
> represents the list of reco elements, in tree order, whose reco control
> is the element in question. The reco IDL attribute of recoable elements,
> on getting, must return that NodeList object.
>
> The form IDL attribute is part of the element's forms API.
>
> Two constructors are provided for creating HTMLRecoElement objects (in
> addition to the factory methods from DOM Core such as createElement()):
> Reco() and Reco(for). When invoked as constructors, these must return a
> new HTMLRecoElement object (a new reco element). If the for argument is
> present, the object created must have its for content attribute set to
> the provided value. The element's document must be the active document
> of the browsing context of the Window object on which the interface
> object of the invoked constructor is found.
>
> *********
>
> I’m not sure if there’s any need for a TTS element, or if that can stay
> just JS only. If we need a TTS element it might be something like the
> following (again, we might expand the content attributes for the other
> aspects that the group is working on like eventhandlers, remote
> services, etc.):
>
> *********
>
> The tts element
>
> Categories
>
> Flow content.
>
> Phrasing content.
>
> Embedded content.
>
> If the element has a controls attribute: Interactive content.
>
> Contexts in which this element can be used:
>
> Where embedded content is expected.
>
> Content model:
>
> If the element has a src attribute: zero or more track elements, then
> transparent, but with no media element descendants.
>
> If the element does not have a src attribute: one or more source
> elements, then zero or more track elements, then transparent, but with
> no media element descendants.
>
> Content attributes:
>
> Global attributes
>
> src
>
> crossorigin
>
> preload
>
> autoplay
>
> mediagroup
>
> loop
>
> muted
>
> controls
>
> DOM interface:
>
> [NamedConstructor=TTS(),
>
> NamedConstructor=TTS(in DOMString src)]
>
> interface HTMLTTSElement : HTMLMediaElement {};
>
> A TTS element represents a synthesized audio stream.
>
> Content may be provided inside the TTS element. User agents should not
> show this content to the user; it is intended for older Web browsers
> which do not support TTS.
>
> In particular, this content is not intended to address accessibility
> concerns. To make TTS content accessible to those with physical or
> cognitive disabilities, authors are expected to provide alternative
> media streams and/or to embed accessibility aids (such as
> transcriptions) into their media streams.
>
> The TTS element is a media element whose media data is ostensibly
> synthesized audio data.
>
> The src, preload, autoplay, mediagroup, loop, muted, and controls
> attributes are the attributes common to all media elements.
>
> When a TTS element is potentially playing, it must have its TTS data
> played synchronized with the current playback position, at the element's
> effective media volume.
>
> When a TTS element is not potentially playing, TTS must not play for the
> element.
>
> tts = new TTS( [ url ] )
>
> Returns a new TTS element, with the src attribute set to the value
> passed in the argument, if applicable.
>
> Two constructors are provided for creating HTMLTTSElement objects (in
> addition to the factory methods from DOM Core such as createElement()):
> TTS() and TTS(src). When invoked as constructors, these must return a
> new HTMLTTSElement object (a new tts element). The element must have its
> preload attribute set to the literal value "auto". If the src argument
> is present, the object created must have its src content attribute set
> to the provided value, and the user agent must invoke the object's
> resource selection algorithm before returning. The element's document
> must be the active document of the browsing context of the Window object
> on which the interface object of the invoked constructor is found.
>
Received on Thursday, 30 June 2011 10:41:48 UTC