- From: Charles Hemphill <charles@everspeech.com>
- Date: Thu, 21 Jul 2011 10:59:42 -0700
- To: "'HTML Speech XG'" <public-xg-htmlspeech@w3.org>
- Message-ID: <02c301cc47cf$f839ed50$e8adc7f0$@everspeech.com>
Hi Everyone, Per request, I have included notes related to the TTS Speech API discussion below. I encourage feedback. It sounded like there was no objection to proceeding with something based on the Microsoft TTS proposal. This seems to have the desired level of focus on the Web API, but also includes a tts tag patterned (where appropriate) after the audio tag. Moving forward, probably after the current effort, it would be nice to consider how standards can encourage integration of speech technologies into Web applications. A Web API only may assume too much on the part of the developer. Best regards, Charles Charles: Surprised to read no tts tag at F2F meeting. Robert: Concluded no need. Everything can be done with an object. No benefit of a tag. Have other Web API only APIs: WebSockets, Geolocation. (http://www.w3.org/TR/websockets/) (http://dev.w3.org/geo/api/spec-source.html) Charles: Those are infrastructure-level APIs. TTS is at the UI level. Beneficial to integrate with other UI-level elements. Robert: Value of markup if want to render as part of text flow. Tag for shuttle control seems to be a corner case. Dan: Had established Web API a primary, add markup if necessary. Why need something beyond existing audio mechanisms. TTS just another audio stream. Satish: Audio element has things not relevant to TTS and TTS has other events. Charles: Agree. First point was to establish if we had a tag. Second point was to look at something like Microsoft's TTS proposal that outlined differences between audio and tts tags. Dan: Not opposed to a tts tag. Have not really discussed TTS enough. Robert: Object to a tts tag, but not strongly. Microsoft TTS proposal plagiarized Google's proposal, focus on scripting API. Thought about JavaScript API before looking at the tag - get semantics right. Tag is another level of difficulty. Worried about scope creep. Comfortable if focus on API then look at tag as a second step. Dan: Focus on Web API first. Very strong group decision. Charles: Wanted to establish that we have a tag and start with Microsoft TTS proposal as a basis. This looks at JavaScript API. Thought that this proposal was primarily borrowed from the audio tag from HTML5. A good starting point. Robert: Found Google's API, looked at audio tag and made modifications. Olli: Bjorn's TTS proposal was good. Prefer Microsoft's proposal more. Charles: Strongly feel we need a tag as a declarative basis to address Web developers. Worried about adoption if we expect Web developers to use an API that is too low level, aimed at speech experts. Dan: Argued for a Web API on that basis. Create libraries that applications (frameworks) can use and simplify. Olli: What is the use case for a TTS only element? Charles: Multimedia instruction manual comes to mind. The instructions for a step can be spoken when the page loads using a simple tag. Dan: Web API first. Get to decision in 2 weeks. If you feel we really need markup, point this out. Perhaps a direction for next steps. From: public-xg-htmlspeech-request@w3.org [mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Charles Hemphill Sent: Thursday, July 21, 2011 8:07 AM To: 'HTML Speech XG' Subject: TTS Speech API TTS Speech API Attached document is a start. Uses the main template, maintaining the original format for easy reintegration. Includes relevant requirements and design decisions. Some new TTS items noted through symmetry (see yellow highlights). Recommend reformatting requirements and design decisions in the main document. Use a table to note applicability of the items. E.g., columns that indicate applicability to reco, tts. Would avoid separate copy and update of this information. Two major decisions: 1) Use a TTS tag 2) Basis of the approach Use a TTS tag: This is recommended for reasons similar to the reco tag. For an HTML API, we should be able to do (very) simple things with just markup. Helps the API be more declarative. Can control features through the standard DOM (similar to the audio element). Better for standard HTML developers (the main target?) Provides a standard place to add event handlers. Allows for visual control, e.g. patterned after the audio element. Can support GUI related interaction considerations: focus and visibility. Note: somehow recommendation of a TTS tag was dropped at the last minute in the F2F meeting. Basis of the approach: Use a derivative of the audio tag. Start with the Microsoft TTS proposal. Should fit best with HTML5. Other considerations: Spoke about fallback approaches. The <source> tag allows for multiple formats - can consider this approach. Also have canPlayType method: var source= document.createElement('source'); if (audio.canPlayType('audio/mpeg;')) { source.type= 'audio/mpeg'; } else { source.type= 'audio/ogg'; } audio.appendChild(source); Other documents: CSS3 Speech Module W3C Working Draft 19 April 2011 http://www.w3.org/TR/css3-speech/ Aural Stylesheets http://www.w3.org/TR/CSS21/aural.html http://www.w3.org/TR/css3-speech/#property-index
Received on Thursday, 21 July 2011 18:00:30 UTC