Re: Voice assistants means platform focus from Brian Subirana on 2019-03-18 (public-voice-assistant@w3.org from March 2019)

From: Brian Subirana <subirana@mit.edu>
Date: Mon, 18 Mar 2019 11:09:15 +0000
To: "Raj (Openstream)" <raj@openstream.com>, Claudio Luis Vera <claudio@simple-theory.com>, "lw@tetralogical.com" <lw@tetralogical.com>
CC: Cameron Cundiff <cameron@ckundo.com>, "Joseph K O'Connor" <josephoconnor@mac.com>, "public-voice-assistant@w3.org" <public-voice-assistant@w3.org>
Message-ID: <943230CD-4129-40C6-8B53-3D3CF402C0CD@mit.edu>
Hi All,

Thank you for your interesting messages.

Personally I feel voice deserves a "user centric" standardization effort. Really I don't see any relevant difference between talking to "Tide" on Alexa, on an iWatch's Siri or on its Website.

In particular I'm keen on standardizing Wake words. A few of us have demonstrated you can have a "voice name system", similar to the Internet's DNS, that promises to take us a few steps in that direction. Our approach is not voice centric only and we have demonstrated how neural activation can also be part of a standard. IMHO Really what matters is a user's will - not where or how it is expressed.

I'm not sure if this is the right venue to continue this conversation. I'd appreciate any advice.

Best,

Brian   

On 3/18/19, 5:02 AM, "Raj (Openstream)" <raj@openstream.com> wrote:

    Atleast  for sharing “ data/input”  that is not  just speech among 
    systems, one could consider the work done by W3C Multi Modal 
    Interaction Working Group ( MMI WG)..particularly, Extensions for 
    Multimodal Annotation ( EMMA) mentioned here:
    
    https://www.w3.org/TR/emma/

    
    In addition to providing a portable way to share input through speech 
    & other modalities, it also provides for time-sequence of these input 
    events for maintaining isochronous info.
    
    Regards,
    Raj
    
    
    
    
    
    On Sat, 16 Mar 2019 11:58:02 -0700
      Claudio Luis Vera <claudio@simple-theory.com> wrote:
    > Joe brings up a field that most of us who work in web accessibility 
    >are
    > barely aware of. AAC allows people who are non-verbal to communicate
    > through synthesized speech, and typically it's the only means 
    >available for
    > verbal communications. AAC developers historically have worked with 
    >bespoke
    > solutions, and manufacturers have not been diligent about backward 
    >and
    > forward compatibility. This has left communicators like Joe's 
    >daughter
    > Siobhan virtually helpless when their AAC system eventually fails.
    > 
    > I would hate to see Joe's concerns being brushed off as off-topic in 
    >this
    > forum. Instead, I think we should take a more holistic approach to 
    >voice
    > UI. Today's voice assistants are conversational interfaces that are
    > primarily geared at gathering voice input from a user, to return 
    >content
    > from a remote source through speech.
    > 
    > AAC reverses this challenge:  A user like Joe's daughter should have 
    >the
    > most frictionless input means available, in order to select the 
    >words that
    > will be output through synthesized speech. The best solutions would 
    >look
    > at reducing friction and speeding up that process through any means
    > possible (eyegaze, switches, autocomplete, AI, machine learning, 
    >e.g.)
    > Today's AAC systems typically don't take advantage of smart 
    >technologies
    > yet.
    > 
    > In addition, Joe brings up a huge interoperability and portability
    > challenge. A standardized approach like package.json for capturing
    > configuration settings and dependencies would take care of many of 
    >these
    > issues. I can't fathom that the other data could not be ported 
    >through a
    > typical data migration as a volunteer hacking project.
    > 
    > We really should broaden our approach so that portability and 
    >forward
    > compatibility front and center, and that AAC and voice output is 
    >also
    > included.
    > 
    > On Sat, Mar 16, 2019 at 7:53 AM Léonie Watson <lw@tetralogical.com> 
    >wrote:
    > 
    >>
    >> On 16/03/2019 12:53, Cameron Cundiff wrote:
    >> >  From what I can tell, the original intention was to focus on 
    >>design of
    >> conversational interfaces with voice assistant platforms 
    >>specifically, as
    >> opposed to voice as an input mechanism, or core text to speech and 
    >>speech
    >> to text tech. Does that sound right Léonie?
    >>
    >> More or less, yes. The idea was to look at whether we could come up 
    >>with
    >> a way to code once and deploy across multiple platforms.
    >>
    >> Léonie.
    >>
    >> >
    >> > Best,
    >> > Cameron
    >> >
    >> >> On Mar 16, 2019, at 7:30 AM, Joseph K O'Connor 
    >><josephoconnor@mac.com>
    >> wrote:
    >> >>
    >> >> Interoperability of databases is my first goal.
    >> >>
    >> >> Manufacturers of learning management systems (WebCT, Blackboard, 
    >>Desire
    >> to Learn are examples) have agreed to make courseware interoperable. 
    >>The
    >> standard is SCORM, Shareable Content Object Reference Model.
    >> >>
    >> >> At its core, SCORM allows content authors to distribute their 
    >>content
    >> to a variety of Learning Management Systems (LMS) with the smallest
    >> headache possible. And for an LMS to handle content from a variety 
    >>of
    >> sources.
    >> >>
    >> >> In the same way there is a need for users of AAC systems to load 
    >>the
    >> databases they have created on one system onto another system.
    >> >>
    >> >> Told from the point of view of one communicator, some info about 
    >>AAC
    >> systems and possible areas where standards will help.
    >> >>
    >> >> http://accessiblejoe.com/wizard/

    >> >>
    >> >> Joseph
    >> >>
    >> >>> On Mar 16, 2019, at 2:46 AM, Léonie Watson <lw@tetralogical.com>
    >> wrote:
    >> >>>
    >> >>> I don't know much about Alternative and Augmentitive 
    >>Communication
    >> (AAC) systems. Can you give us a simple description or point to some 
    >>good
    >> descriptions elsewhere?
    >> >>>
    >> >>> Also, what would the standardisation look like for an AAC 
    >>system? What
    >> are the things that could be standardised?
    >> >>>
    >> >>>
    >> >>> Léonie.
    >> >>>
    >> >>>
    >> >>>> On 16/03/2019 02:42, Joseph K O'Connor, wrote:
    >> >>>> I'm interested in talking about standards for AAC systems. For
    >> instance, databases are not interoperable, even between different 
    >>devices
    >> by the same manufacturer. This has very serious effects. Each time 
    >>my
    >> daughter has to switch devices we have to remake all the grids, 
    >>buttons,
    >> button behaviors, links between pages, find and upload pictures of 
    >>people
    >> she interacts with, and deal with subtle changes introduced by the 
    >>new
    >> software. Who will do this when we're gone? I fear for her future.
    >> >>>> Thanks,
    >> >>>> Joe
    >> >>>>> On Mar 15, 2019, at 8:34 AM, Cameron Cundiff 
    >><cameron@ckundo.com>
    >> wrote:
    >> >>>>>
    >> >>>>> Thanks Léonie. I’ll chime in with my interests too.
    >> >>>>>
    >> >>>>> I’m curious to find emergent practices in Voice UI design, and
    >> figure out how to document and influence them.
    >> >>>>>
    >> >>>>> Examples include: how to offer non-verbal alternatives to 
    >>speech
    >> input for non-verbal users; expectations for accent support and
    >> internationalization; accommodations for AAC users and delayed 
    >>speech;
    >> volume controls and defaults; enabling and disabling speech input 
    >>and
    >> playback. To name a few.
    >> >>>>>
    >> >>>>> Best,
    >> >>>>> Cameron
    >> >>>>>
    >> >>>>>> On Mar 15, 2019, at 11:16 AM, Léonie Watson 
    >><lw@tetralogical.com>
    >> wrote:
    >> >>>>>>
    >> >>>>>> I think the original reason for this CG was to explore
    >> standardisation across the different voice assistants.
    >> >>>>>>
    >> >>>>>> This was in part an attempt to avoid the enduring problem 
    >>already
    >> evident with native mobile development: cross-platform production is 
    >>costly
    >> and complicated.
    >> >>>>>>
    >> >>>>>> There is also a counterpart in the UI that is far more common 
    >>than
    >> it is for mobile: the burden of learning and swapping between 
    >>assistants is
    >> high, but because of the significant differences in their 
    >>capabilities,
    >> it's increasingly common to find households with devices from 
    >>multiple
    >> providers.
    >> >>>>>>
    >> >>>>>> That doesn't mean the CG needs to continue along this path, 
    >>though
    >> we might need a name change if we alter course!
    >> >>>>>>
    >> >>>>>> Phil, can you describe more about the things you mentioned? 
    >>I'm not
    >> quite sure I understood the sort of thing you'd like the CG to 
    >>explore.
    >> >>>>>>
    >> >>>>>>
    >> >>>>>> Perhaps with all the possibilities, it would help to throw 
    >>some
    >> suggestions out as to the deliverables we might produce?
    >> >>>>>>
    >> >>>>>> Léonie.
    >> >>>>>>
    >> >>>>>>
    >> >>>>>>
    >> >>>>>>
    >> >>>>>>> On 15/03/2019 14:33, Phil Archer wrote:
    >> >>>>>>> I don't speak for others but for my own POV we're not 
    >>talking about
    >> >>>>>>> established voice assistants like the ones you mention, no. 
    >>My own
    >> >>>>>>> interest - and I'm being led by Brian Subirana - is on 
    >>talking
    >> to/about
    >> >>>>>>> products ('cos GS1 is about commerce). Things like wake 
    >>words that
    >> can
    >> >>>>>>> be referenced - Brain might be able to jump in and say more.
    >> >>>>>>> But to come to your point - I'd certainly be interested in 
    >>voice
    >> UI in
    >> >>>>>>> general, not specifically voice assistants.
    >> >>>>>>> Phil
    >> >>>>>>>> On 15/03/2019 14:17, Cameron Cundiff wrote:
    >> >>>>>>>> Hi folks,
    >> >>>>>>>>
    >> >>>>>>>> Thinking about our focus on voice assistants and the limits 
    >>of
    >> that.
    >> >>>>>>>>
    >> >>>>>>>> I think conversational interfaces are a narrow subset of 
    >>voice
    >> UI, are platform specific in implementation and design, and are 
    >>limited
    >> modalities compared to generalized voice commands.
    >> >>>>>>>>
    >> >>>>>>>> It’d be easier, in my opinion, to talk about standards for 
    >>Voice
    >> UI than specifically assistants, because these assistants operate 
    >>with
    >> different mental models compared to one another.
    >> >>>>>>>>
    >> >>>>>>>> Is this CG exclusively focused on Alexa, Google Assistant, 
    >>Siri
    >> etc, or can it reach into general voice input for AR and VR, web, 
    >>apps, etc?
    >> >>>>>>>>
    >> >>>>>>>> Is it limited to conversational interfaces, or can it 
    >>include
    >> single turn commands, earcons, and speech playback?
    >> >>>>>>>>
    >> >>>>>>>> Best,
    >> >>>>>>>> Cameron
    >> >>>>>>>>
    >> >>>>>>>>
    >> >>>>>>>>
    >> >>>>>>>>
    >> >>>>>>>>
    >> >>>>>>> --
    >> >>>>>>> Phil Archer
    >> >>>>>>> Director, Web Solutions, GS1
    >> >>>>>>> https://www.gs1.org
    >> >>>>>>> http://philarcher.org

    >> >>>>>>> +44 (0)7887 767755
    >> >>>>>>> @philarcher1
    >> >>>>>>> Skype: philarcher
    >> >>>>>>> CONFIDENTIALITY / DISCLAIMER: The contents of this e-mail 
    >>are
    >> confidential and are not to be regarded as a contractual offer or
    >> acceptance from GS1 (registered in Belgium).
    >> >>>>>>> If you are not the addressee, or if this has been copied or 
    >>sent
    >> to you in error, you must not use data herein for any purpose, you 
    >>must
    >> delete it, and should inform the sender.
    >> >>>>>>> GS1 disclaims liability for accuracy or completeness, and 
    >>opinions
    >> expressed are those of the author alone.
    >> >>>>>>> GS1 may monitor communications.
    >> >>>>>>> Third party rights acknowledged.
    >> >>>>>>> (c) 2016.
    >> >>>>>>
    >> >>>>>> --
    >> >>>>>> @TetraLogical TetraLogical.com
    >> >>>>>
    >> >>>>>
    >> >>>
    >> >>> --
    >> >>> @TetraLogical TetraLogical.com
    >> >>>
    >> >>
    >> >>
    >> >
    >>
    >> --
    >> @TetraLogical TetraLogical.com
    >>
    >>
    > 
    > -- 
    > User Experience | Information Architecture | Accessibility
    > simple-theory.com
    > +1 954-417-4188
    --
    NOTICE TO RECIPIENT:  
    THIS E-MAIL IS  MEANT FOR ONLY THE INTENDED RECIPIENT OF THE TRANSMISSION, AND MAY BE A COMMUNICATION PRIVILEGED BY LAW.  IF YOU RECEIVED THIS E-MAIL IN ERROR, ANY REVIEW, USE, DISSEMINATION, DISTRIBUTION, OR COPYING OF THIS E-MAIL IS STRICTLY PROHIBITED.  PLEASE NOTIFY US IMMEDIATELY OF THE ERROR BY RETURN E-MAIL AND PLEASE DELETE THIS MESSAGE FROM YOUR SYSTEM. THANK YOU IN ADVANCE FOR YOUR COOPERATION. 
    Reply to : legal@openstream.com
Received on Monday, 18 March 2019 11:10:05 UTC