RE: Overview paragraph from Young, Milan on 2011-04-21 (public-xg-htmlspeech@w3.org from April 2011)

From: Young, Milan <Milan.Young@nuance.com>
Date: Wed, 20 Apr 2011 18:06:29 -0700
To: "Deborah Dahl" <dahl@conversational-technologies.com>, "Bjorn Bringert" <bringert@google.com>
Cc: "Patrick Ehlen" <pehlen@attinteractive.com>, "Raj(Openstream)" <raj@openstream.com>, "Satish S" <satish@google.com>, "DRUTA, DAN (ATTSI)" <dd5826@att.com>, <public-xg-htmlspeech@w3.org>
Message-ID: <1AA381D92997964F898DF2A3AA4FF9AD0AF2CE71@SUN-EXCH01.nuance.com>
To be clear, Bjorn's suggestion is not a means of ensuring app
portability across default engine boundaries.  It's just an error
handling scheme.  It would be misleading to call the resulting
interaction a "consistent user experience".

If portability across default engine boundaries is truly a goal, we must
define the functional subset.  For example, all must support SSML, SRGS
grammars referenced by http, English models, etc.

If an application wants *anything* outside of this set, they must
nominate an engine and use the richer interface.

Is this worth it?


-----Original Message-----
From: Deborah Dahl [mailto:dahl@conversational-technologies.com] 
Sent: Wednesday, April 20, 2011 3:11 PM
To: 'Bjorn Bringert'; Young, Milan
Cc: 'Patrick Ehlen'; 'Raj(Openstream)'; 'Satish S'; 'DRUTA, DAN
(ATTSI)'; public-xg-htmlspeech@w3.org
Subject: RE: Overview paragraph

I wonder about the case where the default recognizer for one browser
supports a specific language but the default recognizer for another
browser doesn't. I think Bjorn's suggestion would mean that the behavior
in the case of the unsupported language would just be that recognition
isn't available for that application in the browser that doesn't support
that language. I think that would be reasonable and would be better than
an error.
I don't think that different languages would count as "non-standard"
resources (unless we want to list a set of standard languages that have
to be supported, and that doesn't seem like a good idea) but I think
it's in that spirit, since it's something that one recognizer can do but
another one can't.

> -----Original Message-----
> From: public-xg-htmlspeech-request@w3.org
[mailto:public-xg-htmlspeech-
> request@w3.org] On Behalf Of Bjorn Bringert
> Sent: Wednesday, April 20, 2011 5:18 PM
> To: Young, Milan
> Cc: Patrick Ehlen; Raj(Openstream); Satish S; Deborah Dahl; DRUTA, DAN
> (ATTSI); public-xg-htmlspeech@w3.org
> Subject: Re: Overview paragraph
> 
> We could either prevent applications from trying to use non-standard
> resources with the default speech services, or specify how the
> fallback will work if those resources are not available.
> 
> To take a fictional example, if the app specifies something like
> grammar="x-acme:foo", we could either specify that this is an error,
> or that the recognizer should treat this as if the grammar parameter
> was not set at all. I'd prefer the latter, since it makes it easier to
> add new standard resources in the future. This is how many other web
> standards work. For example, unknown elements and attributes in HTML
> are silently ignored, unknown properties, fonts etc are silently
> ignored in CSS.
> 
> /Bjorn
> 
> On Wed, Apr 20, 2011 at 9:47 PM, Young, Milan <Milan.Young@nuance.com>
> wrote:
> > I am in favor of what Patrick is proposing below.  But I'm still
uneasy
> > about the language around the default engines.
> >
> > The problem is that we have no way of limiting how the app might use
the
> > default recognizer or synthesizer.  It might, for example, make use
of
> > proprietary resources such as grammars, models, or pronunciations.
> >
> > Requiring that such an application behaved even "consistently"
across
> > all engines would require an enumeration of all such resources.
Engines
> > would be prevented from extending this set unless they used
"outside"
> > channels such as what Patrick outlined below.
> >
> >
> >
> > -----Original Message-----
> > From: Patrick Ehlen [mailto:pehlen@attinteractive.com]
> > Sent: Wednesday, April 20, 2011 1:44 PM
> > To: Bjorn Bringert
> > Cc: Young, Milan; Raj(Openstream); Satish S; Deborah Dahl; DRUTA,
DAN
> > (ATTSI); public-xg-htmlspeech@w3.org
> > Subject: Re: Overview paragraph
> >
> > Agreed. In my view, the point here is to provide a consistent set of
> > methods for content developers to access speech services, whatever
their
> > particular capabilities may be.
> >
> > For example, a developer may want to use a recognizer with a
proprietary
> > type of model and an instance of that model on a server somewhere.
We
> > should provide a method for someone to specify a URI for the
recognizer,
> > a URI for the model, and a place to pass parameters that may be
> > particular to that type of model. It would be up to the recognizer
to
> > know how to handle the model and its parameters, but not part of our
job
> > here.
> >
> >
> > On Apr 20, 2011, at 13:22, "Bjorn Bringert" <bringert@google.com>
wrote:
> >
> >> A consistent user experience is not the same as an identical user
> >> experience. For example, user agents render web pages using varying
> >> window sizes and pixel densities.
> >>
> >> /Bjorn
> >>
> >> On Wed, Apr 20, 2011 at 9:10 PM, Young, Milan
> <Milan.Young@nuance.com>
> > wrote:
> >>> All default recognizers must return the same results/timings with
the
> > same
> >>> input waveform?  All default synthesizers should return the same
> > samples on
> >>> the same input SSML?
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> ________________________________
> >>>
> >>> From: Raj(Openstream) [mailto:raj@openstream.com]
> >>> Sent: Wednesday, April 20, 2011 12:57 PM
> >>> To: Satish S; Patrick Ehlen
> >>>
> >>> Cc: Deborah Dahl; Young, Milan; DRUTA, DAN (ATTSI);
> >>> public-xg-htmlspeech@w3.org
> >>> Subject: Re: Overview paragraph
> >>>
> >>>
> >>>
> >>> Yes..I agree with Satish's point...any application that desires to
> > leverage
> >>> advanced/specific features
> >>>
> >>> of an ASR, cannot be guaranteed to be portable..within the scope
our
> >>> spec..and applications
> >>>
> >>> that use the default ( LCD ?) recognizer ( not sure if this is
what
> > Dan D
> >>> had in mind, by saying
> >>>
> >>> "simple" applications )  should be portable and have consistent
user
> >>> experience with conforming
> >>>
> >>> browser/clients.
> >>>
> >>>
> >>>
> >>> --Raj
> >>>
> >>> ----- Original Message -----
> >>>
> >>> From: Satish S
> >>>
> >>> To: Patrick Ehlen
> >>>
> >>> Cc: Deborah Dahl ; Young, Milan ; DRUTA, DAN (ATTSI) ;
> >>> public-xg-htmlspeech@w3.org
> >>>
> >>> Sent: Wednesday, April 20, 2011 3:38 PM
> >>>
> >>> Subject: Re: Overview paragraph
> >>>
> >>>
> >>>
> >>> As an express goal, perhaps we should clearly state that
applications
> > that
> >>> use the default/built-in recognizer should be portable across all
> > browsers
> >>> and speech engines. Beyond that, if the web app chooses to use a
> > particular
> >>> engine by specifying a URL it seems ok to rely on
extended/additional
> >>> capabilities provided by that engine.
> >>>
> >>> Cheers
> >>> Satish
> >>>
> >>> On Wed, Apr 20, 2011 at 5:00 PM, Patrick Ehlen
> > <pehlen@attinteractive.com>
> >>> wrote:
> >>>
> >>> Deborah is right that not all speech engines will have the same
> >>> capabilities, but we should strive to provide general
> > parameterizations of
> >>> the potential capabilities wherever possible. Otherwise engine
> > providers
> >>> will need to add their own extensions to the standard, and
> > development will
> >>> get fractured across the lines of browser/engine, as we saw happen
> > with
> >>> earlier Javascript XML handlers, etc.
> >>>
> >>> On Apr 20, 2011, at 8:27, "Deborah Dahl"
> >>> <dahl@conversational-technologies.com> wrote:
> >>>
> >>>> I don't think we can reach the goal of applications being
completely
> >>>> portable across speech engines  because speech engines will
always
> > have
> >>>> different capabilities, and some of these are unlikely to be in
the
> > scope
> >>>> of
> >>>> our API.  For example, engines will handle different languages,
some
> >>>> engines
> >>>> will be able to handle larger grammars, some applications will
make
> > use of
> >>>> proprietary SLM's, and some applications won't be usable without
an
> > engine
> >>>> that has a certain level of accuracy. So  I agree with Milan that
> > the goal
> >>>> is not to standardize functionality across speech engines. I
think
> > we
> >>>> should
> >>>> just say " provide the user with a consistent experience across
> > different
> >>>> platforms and devices" and leave it at that.
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: public-xg-htmlspeech-request@w3.org
> > [mailto:public-xg-htmlspeech-
> >>>>> request@w3.org] On Behalf Of Satish S
> >>>>> Sent: Wednesday, April 20, 2011 5:18 AM
> >>>>> To: Young, Milan
> >>>>> Cc: DRUTA, DAN (ATTSI); public-xg-htmlspeech@w3.org
> >>>>> Subject: Re: Overview paragraph
> >>>>>
> >>>>>    >> provide the user with a consistent experience across
> > different
> >>>>>    platforms and devices irrespective of the speech engine used.
> >>>>>
> >>>>>
> >>>>>    This effort is not about standardizing functionality across
> > speech
> >>>>>    engines.  The goal is speech application portability across
the
> >>>>>    browsers.  Simple applications MAY be portable across speech
> > engine
> >>>>>    boundaries, but that's not a requirement.
> >>>>>
> >>>>>
> >>>>>
> >>>>> I'd say the API proposal should aim for all applications to be
> > portable
> >>>> across
> >>>>> speech engines. Starting with "may be portable" doesn't seem to
fit
> > the
> >>>> spirit
> >>>>> of the web. Any extensions for speech engine specific parameters
> > and
> >>>>> results should be optional.
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Bjorn Bringert
> >> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
> >> Palace Road, London, SW1W 9TQ
> >> Registered in England Number: 3977902
> >>
> >
> 
> 
> 
> --
> Bjorn Bringert
> Google UK Limited, Registered Office: Belgrave House, 76 Buckingham
> Palace Road, London, SW1W 9TQ
> Registered in England Number: 3977902
Received on Thursday, 21 April 2011 01:07:37 UTC