RE: Overview paragraph from Young, Milan on 2011-04-20 (public-xg-htmlspeech@w3.org from April 2011)

From: Young, Milan <Milan.Young@nuance.com>
Date: Wed, 20 Apr 2011 13:47:28 -0700
To: "Patrick Ehlen" <pehlen@attinteractive.com>, "Bjorn Bringert" <bringert@google.com>
Cc: "Raj(Openstream)" <raj@openstream.com>, "Satish S" <satish@google.com>, "Deborah Dahl" <dahl@conversational-technologies.com>, "DRUTA, DAN (ATTSI)" <dd5826@att.com>, <public-xg-htmlspeech@w3.org>
Message-ID: <1AA381D92997964F898DF2A3AA4FF9AD0AF2CC88@SUN-EXCH01.nuance.com>
I am in favor of what Patrick is proposing below.  But I'm still uneasy
about the language around the default engines.

The problem is that we have no way of limiting how the app might use the
default recognizer or synthesizer.  It might, for example, make use of
proprietary resources such as grammars, models, or pronunciations.

Requiring that such an application behaved even "consistently" across
all engines would require an enumeration of all such resources.  Engines
would be prevented from extending this set unless they used "outside"
channels such as what Patrick outlined below.



-----Original Message-----
From: Patrick Ehlen [mailto:pehlen@attinteractive.com] 
Sent: Wednesday, April 20, 2011 1:44 PM
To: Bjorn Bringert
Cc: Young, Milan; Raj(Openstream); Satish S; Deborah Dahl; DRUTA, DAN
(ATTSI); public-xg-htmlspeech@w3.org
Subject: Re: Overview paragraph

Agreed. In my view, the point here is to provide a consistent set of
methods for content developers to access speech services, whatever their
particular capabilities may be.

For example, a developer may want to use a recognizer with a proprietary
type of model and an instance of that model on a server somewhere. We
should provide a method for someone to specify a URI for the recognizer,
a URI for the model, and a place to pass parameters that may be
particular to that type of model. It would be up to the recognizer to
know how to handle the model and its parameters, but not part of our job
here. 


On Apr 20, 2011, at 13:22, "Bjorn Bringert" <bringert@google.com> wrote:

> A consistent user experience is not the same as an identical user
> experience. For example, user agents render web pages using varying
> window sizes and pixel densities.
> 
> /Bjorn
> 
> On Wed, Apr 20, 2011 at 9:10 PM, Young, Milan <Milan.Young@nuance.com>
wrote:
>> All default recognizers must return the same results/timings with the
same
>> input waveform?  All default synthesizers should return the same
samples on
>> the same input SSML?
>> 
>> 
>> 
>> 
>> 
>> ________________________________
>> 
>> From: Raj(Openstream) [mailto:raj@openstream.com]
>> Sent: Wednesday, April 20, 2011 12:57 PM
>> To: Satish S; Patrick Ehlen
>> 
>> Cc: Deborah Dahl; Young, Milan; DRUTA, DAN (ATTSI);
>> public-xg-htmlspeech@w3.org
>> Subject: Re: Overview paragraph
>> 
>> 
>> 
>> Yes..I agree with Satish's point...any application that desires to
leverage
>> advanced/specific features
>> 
>> of an ASR, cannot be guaranteed to be portable..within the scope our
>> spec..and applications
>> 
>> that use the default ( LCD ?) recognizer ( not sure if this is what
Dan D
>> had in mind, by saying
>> 
>> "simple" applications )  should be portable and have consistent user
>> experience with conforming
>> 
>> browser/clients.
>> 
>> 
>> 
>> --Raj
>> 
>> ----- Original Message -----
>> 
>> From: Satish S
>> 
>> To: Patrick Ehlen
>> 
>> Cc: Deborah Dahl ; Young, Milan ; DRUTA, DAN (ATTSI) ;
>> public-xg-htmlspeech@w3.org
>> 
>> Sent: Wednesday, April 20, 2011 3:38 PM
>> 
>> Subject: Re: Overview paragraph
>> 
>> 
>> 
>> As an express goal, perhaps we should clearly state that applications
that
>> use the default/built-in recognizer should be portable across all
browsers
>> and speech engines. Beyond that, if the web app chooses to use a
particular
>> engine by specifying a URL it seems ok to rely on extended/additional
>> capabilities provided by that engine.
>> 
>> Cheers
>> Satish
>> 
>> On Wed, Apr 20, 2011 at 5:00 PM, Patrick Ehlen
<pehlen@attinteractive.com>
>> wrote:
>> 
>> Deborah is right that not all speech engines will have the same
>> capabilities, but we should strive to provide general
parameterizations of
>> the potential capabilities wherever possible. Otherwise engine
providers
>> will need to add their own extensions to the standard, and
development will
>> get fractured across the lines of browser/engine, as we saw happen
with
>> earlier Javascript XML handlers, etc.
>> 
>> On Apr 20, 2011, at 8:27, "Deborah Dahl"
>> <dahl@conversational-technologies.com> wrote:
>> 
>>> I don't think we can reach the goal of applications being completely
>>> portable across speech engines  because speech engines will always
have
>>> different capabilities, and some of these are unlikely to be in the
scope
>>> of
>>> our API.  For example, engines will handle different languages, some
>>> engines
>>> will be able to handle larger grammars, some applications will make
use of
>>> proprietary SLM's, and some applications won't be usable without an
engine
>>> that has a certain level of accuracy. So  I agree with Milan that
the goal
>>> is not to standardize functionality across speech engines. I think
we
>>> should
>>> just say " provide the user with a consistent experience across
different
>>> platforms and devices" and leave it at that.
>>> 
>>>> -----Original Message-----
>>>> From: public-xg-htmlspeech-request@w3.org
[mailto:public-xg-htmlspeech-
>>>> request@w3.org] On Behalf Of Satish S
>>>> Sent: Wednesday, April 20, 2011 5:18 AM
>>>> To: Young, Milan
>>>> Cc: DRUTA, DAN (ATTSI); public-xg-htmlspeech@w3.org
>>>> Subject: Re: Overview paragraph
>>>> 
>>>>    >> provide the user with a consistent experience across
different
>>>>    platforms and devices irrespective of the speech engine used.
>>>> 
>>>> 
>>>>    This effort is not about standardizing functionality across
speech
>>>>    engines.  The goal is speech application portability across the
>>>>    browsers.  Simple applications MAY be portable across speech
engine
>>>>    boundaries, but that's not a requirement.
>>>> 
>>>> 
>>>> 
>>>> I'd say the API proposal should aim for all applications to be
portable
>>> across
>>>> speech engines. Starting with "may be portable" doesn't seem to fit
the
>>> spirit
>>>> of the web. Any extensions for speech engine specific parameters
and
>>>> results should be optional.
>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> 
> -- 
> Bjorn Bringert
> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
> Palace Road, London, SW1W 9TQ
> Registered in England Number: 3977902
>
Received on Wednesday, 20 April 2011 20:49:21 UTC