RE: Split TTS and Speech Recognition? from Young, Milan on 2013-12-11 (public-speech-api@w3.org from December 2013)

From: Young, Milan <Milan.Young@nuance.com>
Date: Wed, 11 Dec 2013 00:31:47 +0000
To: Bjorn Bringert <bringert@google.com>
CC: "public-speech-api@w3.org" <public-speech-api@w3.org>, Glen Shires <gshires@google.com>, Doug Schepers <schepers@w3.org>, "Raj (Openstream)" <raj@openstream.com>
Message-ID: <B236B24082A4094A85003E8FFB8DDC3C20CAEAD2@SOM-EXCH03.nuance.com>
Hello Bjorn,

We’ve been in communication with two of the mainstream vendors.


From: Bjorn Bringert [mailto:bringert@google.com]
Sent: Monday, December 09, 2013 1:11 PM
To: Young, Milan
Cc: public-speech-api@w3.org; Glen Shires; Doug Schepers; Raj (Openstream)
Subject: RE: Split TTS and Speech Recognition?


Hi Milan,

Out of interest, in which browser would you be implementing the API?

/Bjorn
On Dec 9, 2013 8:42 PM, "Young, Milan" <Milan.Young@nuance.com<mailto:Milan.Young@nuance.com>> wrote:
Please excuse the late response.  I have not been actively monitoring this list for some time.

Contrary to Glen's assertion, I believe a unified spec would indeed accelerate implementation.  Speaking for Nuance, a global leader in the field of both recognition and TTS, we would gladly begin implementation if the spec were sanctioned under a WG.  Splitting recognition from TSS on a temporary or even permanent basis seems like a small price to pay for this greater good.

Regards


> -----Original Message-----
> From: Raj (Openstream) [mailto:raj@openstream.com<mailto:raj@openstream.com>]
> Sent: Wednesday, October 09, 2013 4:49 AM
> To: Doug Schepers; Glen Shires
> Cc: Web Speech
> Subject: Re: Split TTS and Speech Recognition?
>
> Speaking from my vantage position, I find both the arguments plausible,
> recognizing that more work needs to be done before the current artifacts
> become SPECs.
>
> To GLEN's point, implementors can still implement part of the SPEC ( and it
> could be just TTS)..
> and yes, there are plenty of use-cases ( again for a web developer) for just
> using TTS in the apps.
>
> It's not clear to me, how and why keeping them in "SYNCH" would be a better
> thing to do..( aside from the convenience of reading one spec as opposed to
> two)...and at the same time, not sure how splitting them into two, would make
> it more attractive/likely for any other group to absorb...
>
> IMHO, implementors can take any portion of any spec and conform to the
> extent of their capability and desire...
> and so can WGs..
>
> But, yes, it'll continue to be frustrating that we have so many "SPECs" that are
> not standards from a developers'/implementors'
> point of view.
>
> Raj
>
> On Wed, 09 Oct 2013 05:25:10 +0200
>   Doug Schepers <schepers@w3.org<mailto:schepers@w3.org>> wrote:
> > Hi, Glen–
> >
> > I'm not trying to be pesky about this, and I'm not going to get pushy.
> >But I'd like you to reconsider this, and I'd like to hear from others
> >what they think (especially implementers).
> >
> >
> > On 10/8/13 8:40 PM, Glen Shires wrote:
> >> A unified spec hasn't slowed implementations, as there are currently
> >> browsers that implement the ASR portion and not the TTS portion, and
> >> browsers that implement the TTS portion and not the ASR portion.
> >
> > This would seem to be an argument for splitting them up, not keeping
> >them together. They are moving at different rates.
> >
> >
> >> (And speech aside, there are many examples where implementors
> >> implement a spec in parts.)
> >
> > Yes, but this is not good for web developers. It's to be avoided, if
> >possible. With my web developer hat on, this is really frustrating.
> >This is why CSS took a more modular approach, which is working pretty
> >well in terms of consistency and interoperability.
> >
> >
> >> Also, keeping TTS and ASR together avoids the problem of having to
> >>sync  things up in the future.
> >
> > Speaking from a position of ignorance and curiosity, what things need
> >to be synced up between TTS and ASR? They seem pretty orthogonal from
> >my reading of the spec.
> >
> >
> >> As the unified spec matures, it may have a  better chance of finding
> >>a unified home in one of the major W3C groups,  such as HTML.
> >
> > I'm not sure I follow your reasoning there. Why would a single spec
> >have a better chance of being adopted by a WG than 2 smaller specs?
> >
> >
> > Is there some concern that one would get implemented, and not the
> >other, so keeping them together might incent implementers to do both?
> >
> >
> >Finally, I just want to be clear that this request is not me speaking
> >with my W3C hat on; I'm speaking solely as an interested web developer
> >who wants his apps to work in as many browsers as possible, and who's
> >mostly using the TTS stuff.
> >
> > Regards-
> > -Doug
> >
> >
> >> Glen
> >>
> >>
> >> On Tue, Oct 8, 2013 at 9:28 AM, Doug Schepers <schepers@w3.org<mailto:schepers@w3.org>
> >> <mailto:schepers@w3.org<mailto:schepers@w3.org>>> wrote:
> >>
> >>     Hi, folks–
> >>
> >>     I'd like to propose that the text-to-speech feature be split out
> >>     from the Web Speech API spec; it's more or less orthogonal with
> >>the
> >>     speech recognition aspect of the spec, and while there are still
> >>     open issues that are being discussed, I think it's more stable in
> >>     terms of implementations, and could move forward more quickly on
> >>its
> >>     own.
> >>
> >>     I have been using both TTS and speech recognition in some of my
> >>     recent apps, and I think both are very cool and useful; I think
> >>both
> >>     will be great for accessibility, as well. TTS is much simpler,
> >>     though, and I think we could get more implementations right away
> >>if
> >>     we split it out. I really want to see both succeed, at their own
> >>pace.
> >>
> >>     (As an aside, I made a "talking calculator" back in 2004 using
> >>SVG
> >>     and the Microsoft IE TTS API; it no longer works, but it hints to
> >>me
> >>     that it wouldn't be too hard for Microsoft to implement the more
> >>     modern TTS functionality in IE, if the path ahead were clear for
> >>them.)
> >>
> >>     In light of the recent news that the W3C Web Speech WG is not
> >>going
> >>     to be formed [1], I think the work should still be done in the
> >>Web
> >>     Speech Community Group, though maybe when it's mature enough, it
> >>     could move to an existing W3C WG to become a Recommendation.
> >>
> >>     (I don't have a strong feeling about which group this might fit
> >>in,
> >>     but a few spring to mind: the WebApps WG, the Audio WG, or the
> >>HTML
> >>     WG to take advantage of the new CC-BY licensing being
> >>experimented
> >>     on there. It could even be its own WG, though that seems like
> >>     overkill to me.)
> >>
> >>     If any of this resonates with this group, I'm happy to help with
> >>it
> >>     unofficially, with my W3C staff experience. (If it were
> >>ultimately
> >>     moved into the Audio WG, then I could give my official help,
> >>since
> >>     that's one of my working groups. :P)
> >>
> >>     [1]
> >>     http://lists.w3.org/Archives/__Public/public-new-

> work/__2013Oct/0004.html
> >>
> >> <http://lists.w3.org/Archives/Public/public-new-work/2013Oct/0004.htm

> >> l>
> >>
> >>     Regards-
> >>     -Doug
> >>
> >>
> >
> >
> >
>
> --
> NOTICE TO RECIPIENT:
> THIS E-MAIL IS  MEANT FOR ONLY THE INTENDED RECIPIENT OF THE
> TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGED BY LAW.  IF
> YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION,
> DISTRIBUTION, OR COPYING OF THIS E-MAIL IS STRICTLY PROHIBITED.  PLEASE
> NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE
> DELETE THIS MESSAGE FROM YOUR SYSTEM. THANK YOU IN ADVANCE FOR
> YOUR COOPERATION.
> Reply to : legal@openstream.com<mailto:legal@openstream.com>
>
Received on Wednesday, 11 December 2013 00:32:18 UTC