W3C home > Mailing lists > Public > public-xg-htmlspeech@w3.org > July 2011

RE: TTS Speech API

From: Charles Hemphill <charles@everspeech.com>
Date: Thu, 21 Jul 2011 10:59:42 -0700
To: "'HTML Speech XG'" <public-xg-htmlspeech@w3.org>
Message-ID: <02c301cc47cf$f839ed50$e8adc7f0$@everspeech.com>
Hi Everyone,


Per request, I have included notes related to the TTS Speech API discussion
below.  I encourage feedback.


It sounded like there was no objection to proceeding with something based on
the Microsoft TTS proposal.  This seems to have the desired level of focus
on the Web API, but also includes a tts tag patterned (where appropriate)
after the audio tag.


Moving forward, probably after the current effort, it would be nice to
consider how standards can encourage integration of speech technologies into
Web applications.   A Web API only may assume too much on the part of the


Best regards,



Charles: Surprised to read no tts tag at F2F meeting.


Robert: Concluded no need.  Everything can be done with an object.  No
benefit of a tag.

Have other Web API only APIs: WebSockets, Geolocation.




Charles: Those are infrastructure-level APIs.  TTS is at the UI level.
Beneficial to integrate with other UI-level elements.


Robert: Value of markup if want to render as part of text flow.

Tag for shuttle control seems to be a corner case.


Dan: Had established Web API a primary, add markup if necessary.

Why need something beyond existing audio mechanisms.

TTS just another audio stream.


Satish: Audio element has things not relevant to TTS and TTS has other


Charles: Agree.  First point was to establish if we had a tag.  Second point
was to look at something like Microsoft's TTS proposal that outlined
differences between audio and tts tags.


Dan: Not opposed to a tts tag.  Have not really discussed TTS enough.


Robert: Object to a tts tag, but not strongly.  Microsoft TTS proposal
plagiarized Google's proposal, focus on scripting API.  Thought about
JavaScript API before looking at the tag - get semantics right.   Tag is
another level of difficulty.  Worried about scope creep.  Comfortable if
focus on API then look at tag as a second step.


Dan: Focus on Web API first.   Very strong group decision.


Charles: Wanted to establish that we have a tag and start with Microsoft TTS
proposal as a basis.  This looks at JavaScript API.  Thought that this
proposal was primarily borrowed from the audio tag from HTML5.  A good
starting point.


Robert: Found Google's API, looked at audio tag and made modifications.


Olli: Bjorn's TTS proposal was good.  Prefer Microsoft's proposal more.


Charles: Strongly feel we need a tag as a declarative basis to address Web
developers.  Worried about adoption if we expect Web developers to use an
API that is too low level, aimed at speech experts.


Dan: Argued for a Web API on that basis.  Create libraries that applications
(frameworks) can use and simplify.


Olli: What is the use case for a TTS only element?


Charles: Multimedia instruction manual comes to mind.  The instructions for
a step can be spoken when the page loads using a simple tag.


Dan: Web API first.  Get to decision in 2 weeks.  If you feel we really need
markup, point this out.  Perhaps a direction for next steps.



From: public-xg-htmlspeech-request@w3.org
[mailto:public-xg-htmlspeech-request@w3.org] On Behalf Of Charles Hemphill
Sent: Thursday, July 21, 2011 8:07 AM
To: 'HTML Speech XG'
Subject: TTS Speech API


TTS Speech API


Attached document is a start.

Uses the main template, maintaining the original format for easy


Includes relevant requirements and design decisions.

Some new TTS items noted through symmetry (see yellow highlights).


Recommend reformatting requirements and design decisions  in the main

Use a table to note applicability of the items.

E.g., columns that indicate applicability to reco, tts.

Would avoid separate copy and update of this information.


Two major decisions:

1)      Use a TTS tag

2)      Basis of the approach


Use a TTS tag:

This is recommended for reasons similar to the reco tag.

For an HTML API, we should be able to do (very) simple things with just

Helps the API be more declarative.

Can control features through the standard DOM (similar to the audio

Better for standard HTML developers (the main target?)

Provides a standard place to add event handlers.

Allows for visual control, e.g. patterned after the audio element.

Can support GUI related interaction considerations: focus and visibility.

Note: somehow recommendation of a TTS tag was dropped at the last minute in
the F2F meeting.


Basis of the approach:

Use a derivative of the audio tag.

Start with the Microsoft TTS proposal.

Should fit best with HTML5.


Other considerations:


Spoke about fallback approaches.

The <source> tag allows for multiple formats - can consider this approach.

Also have canPlayType method:

var source= document.createElement('source');
if (audio.canPlayType('audio/mpeg;')) {
    source.type= 'audio/mpeg';
} else {
    source.type= 'audio/ogg';


Other documents:


CSS3 Speech Module

W3C Working Draft 19 April 2011



Aural Stylesheets




Received on Thursday, 21 July 2011 18:00:30 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:16:50 UTC