W3C home > Mailing lists > Public > public-wai-cc@w3.org > February 2021

Re: Clarifying speech recognition vs. voice recognition in Style Guide

From: Kim Patch <kim@redstartsystems.com>
Date: Wed, 3 Feb 2021 11:23:14 -0500
To: Estel·la Oncins Noguer <Estella.Oncins@uab.cat>, Shawn Henry <shawn@w3.org>, Katie Haritos-Shea <ryladog@gmail.com>
Cc: "EOWG (E-mail)" <w3c-wai-eo@w3.org>, WAI Coordination Call <public-wai-cc@w3.org>
Message-ID: <39f29726-3479-b9c3-c2d4-51237a5b08db@redstartsystems.com>
Hi everyone,

I think the basic distinction is right, but a couple of clarifications 
based on the discussion:

- Speech input software like Dragon is used for re-speaking in several 
ways, both accessibility related and not, including subtitles, 
transcription, court recording, live and not live. (I and other 
journalists I know use it for transcription, for example).

- Although speech input software like Dragon is speaker-dependent, it 
doesn't use voice recognition technology – the software doesn't 
distinguish between people's voices. Software like Dragon is speaker 
dependent because you train a profile, and then, if there are multiple 
people using the same computer, you make sure to choose your profile 
when you are using it.

- I think the clearest way to distinguish between technology that 
translates speech to text and technology like screen readers that 
translate text to speech is speech input and speech output.

- I don't think there is reason to call speech input used for a11y 
anything different from speech input used as a preference.

So to sum it all up
To talk to a computer you use speech input. Speech recognition software 
enables speech input.
A screen reader uses speech output to show you what's on the screen.
Voice recognition software IDs speakers using the sounds of their voices.
Intelligent agents can use speech or text input or a mix, speech or text 
output or a mix, and they may or may not be able to distinguish who is 
speaking to them.

Cheers,
Kim


On 2/3/2021 1:23 AM, Estel·la Oncins Noguer wrote:
> Hello,
>
> One of the issues is that "speech recognition" can be speaker 
> dependent or speaker independent and here is where "voice recognition" 
> comes into play.
>
> An example in accessibility could be the "respeaking technique" which 
> is used to deliver live subtitles and includes both "voice and speech 
> recognition", as professionals train the software with their own voice 
> to improve accuracy.
>
> Best,
>
> Estel·la Oncins Noguer
> Post Doctoral Research Fellow TransMedia Catalonia
> Edifici MRA 126  - Campus UAB
> 08193 Bellaterra
> Barcelona
> T. +34 610 655 149
>
> orcid.org/0000-0002-0291-3036 <https://orcid.org/0000-0002-0291-3036>
> http://grupsderecerca.uab.cat/transmedia 
> <http://grupsderecerca.uab.cat/transmedia/>
> <http://grupsderecerca.uab.cat/transmedia/> 
> <http://grupsderecerca.uab.cat/transmedia/>
>
> <http://grupsderecerca.uab.cat/transmedia/>
>
> ------------------------------------------------------------------------
> *De:* Shawn Henry <shawn@w3.org>
> *Enviat el:* dimarts, 2 de febrer de 2021 23:58
> *Per a:* Katie Haritos-Shea <ryladog@gmail.com>
> *A/c:* kim@redstartsystems.com <kim@redstartsystems.com>; EOWG 
> (E-mail) <w3c-wai-eo@w3.org>; WAI Coordination Call <public-wai-cc@w3.org>
> *Tema:* Re: Clarifying speech recognition vs. voice recognition in 
> Style Guide
> Hi Katie,
>
> Thanks for the input.
>
> I don't see a need to differentiate non-accessibility tech in this 
> context. Maybe I'm missing something?
>
> afaik, Most of those systems use "speech recognition" to recognize 
> words. Only a few might use "voice recognition" to identify the 
> speaker. And speech output is different.
>
> I've updated the info to make the scope more clear:
> https://www.w3.org/WAI/EO/wiki/Style#speech-recognition 
> <https://www.w3.org/WAI/EO/wiki/Style#speech-recognition>
>
> Does that help clarify? Or, am I still not getting the point?
>
> Best,
> ~Shawn
>
>
> On 02-Feb-21 3:50 PM, Katie Haritos-Shea wrote:
> > Yeah, but what about non-accessibility related speech recognition 
> software, such as what is part of SIRI, Cortana, etc. and any voice 
> enabled UI - who use both of the systems that were originally AT - 
> voice recognition and screen reading speech software? How do we 
> differentiate there?
> >
> > ** katie **
> >
> > *Katie Haritos-Shea****
> > **Principal ICT Accessibility Architect*
> >
> > *
> > **Senior Product Manager/Compliance/Accessibility **SME**,
> > **Core Merchant Framework UX, Clover*
> >
> > *
> > **W3C Advisory Committee Member and Representative for Knowbility *
> >
> > *
> > *
> >
> > *WCAG/Section 508/ADA/AODA/QA/FinServ/FinTech/Privacy,****IAAP 
> CPACC+WAS = **CPWA* 
> <http://www.accessibilityassociation.org/cpwacertificants 
> <http://www.accessibilityassociation.org/cpwacertificants>>
> >
> > *Cell: **703-371-5545 <tel:703-371-5545>**|****_ryladog@gmail.com 
> <mailto:ryladog@gmail.com <mailto:ryladog@gmail.com>>_**|**Seneca, SC 
> **|****LinkedIn Profile <http://www.linkedin.com/in/katieharitosshea/ 
> <http://www.linkedin.com/in/katieharitosshea/>>*
> >
> >
> > People may forget exactly what it was that you said or did, but they 
> will never forget how you made them feel.......
> >
> > Our scars remind us of where we have been........they do not have to 
> dictate where we are going.
> >
> >
> >
> >
> >
> >
> > On Tue, Feb 2, 2021 at 4:37 PM Andrew Arch <andrew@intopia.digital> 
> wrote:
> >
> >     Agree completely Shawn, and good for WAI to make the clear 
> distinction.____
> >
> >     __ __
> >
> >     ‘voice recognition’ is being used increasingly for security to 
> identify the individual based on their voice ‘print’.____
> >
> >     __ __
> >
> >     Andrew____
> >
> >     ______________
> >
> >     Andrew Arch
> >     Intopia____
> >
> >     __ __
> >
> >     __ __
> >
> >     __ __
> >
> >     *From:*Kim Patch <kim@redstartsystems.com 
> <mailto:kim@redstartsystems.com <mailto:kim@redstartsystems.com>>>
> >     *Sent:* Wednesday, 3 February 2021 5:00 AM
> >     *To:* Joshue O Connor <joconnor@w3.org <mailto:joconnor@w3.org 
> <mailto:joconnor@w3.org>>>; Bakken, Brent <Brent.Bakken@Pearson.com>
> >     *Cc:* Shawn Henry <shawn@w3.org <mailto:shawn@w3.org 
> <mailto:shawn@w3.org>>>; EOWG (E-mail) <w3c-wai-eo@w3.org 
> <mailto:w3c-wai-eo@w3.org <mailto:w3c-wai-eo@w3.org>>>; WAI 
> Coordination Call <public-wai-cc@w3.org <mailto:public-wai-cc@w3.org 
> <mailto:public-wai-cc@w3.org>>>
> >     *Subject:* Re: Clarifying speech recognition vs. voice 
> recognition in Style Guide____
> >
> >     __ __
> >
> >     Yes, that's the right distinction.
> >
> >     I also agree that it's important that we get on the same page 
> about this. I've seen them conflated many times – I think we can help 
> by making sure to be clear about the distinction.
> >
> >     Cheers,
> >     Kim____
> >
> >     On 2/2/2021 12:29 PM, Joshue O Connor wrote:____
> >
> >         Hi all,
> >
> >         Interesting point and an important distinction.
> >
> >         In the wild the two will be conflated IMO, for better or 
> worse. It could be one of those things that the sooner the 'experts' 
> get on the same page about, the better.
> >
> >         I'm curious what Kim thinks?
> >
> >         Thanks
> >
> >         Josh
> >
> >
> >         ____
> >
> >             Bakken, Brent <mailto:Brent.Bakken@Pearson.com 
> <mailto:Brent.Bakken@Pearson.com>>____
> >
> >             Tuesday 2 February 2021 17:07____
> >
> >             I would agree with this distinction.
> >
> >             -----Original Message-----
> >             From: Shawn Henry <shawn@w3.org> <mailto:shawn@w3.org 
> <mailto:shawn@w3.org>>
> >             Sent: Tuesday, February 2, 2021 8:44 AM
> >             To: EOWG (E-mail) <w3c-wai-eo@w3.org> 
> <mailto:w3c-wai-eo@w3.org <mailto:w3c-wai-eo@w3.org>>; WAI 
> Coordination Call <public-wai-cc@w3.org> <mailto:public-wai-cc@w3.org 
> <mailto:public-wai-cc@w3.org>>
> >             Cc: Kim Patch <kim@redstartsystems.com> 
> <mailto:kim@redstartsystems.com <mailto:kim@redstartsystems.com>>
> >             Subject: Clarifying speech recognition vs. voice 
> recognition in Style Guide
> >
> >             Hi folks,
> >
> >             Here is a draft update to the WAI Style Guide[1]:
> >
> >             "speech recognition" is for speech-to-text (SST), and 
> usually what we're talking about for accessibility.
> >             "voice recognition" is different; it's about identifying 
> the speaker, not what they're saying.
> >
> >             (For background, search the Web for "speech recognition 
> voice recognition difference")
> >
> >             Please let me know if you disagree or have edit suggestions.
> >
> >             Thanks!
> >             ~Shawn
> >
> >             [1] currently at the end of 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.w3.org_WAI_EO_wiki_Style-23other-5Fwords-5Fand-5Fphrases&d=DwICaQ&c=0YLnzTkWOdJlub_y7qAx8Q&r=v-L6X-ScaY5UKb-F-_zcuXdbPw2UYK_gaTG8R5d9h7U&m=lT0RkzKPWViwYA0MZWTXqn8uDXjrBrrbQtLkZhkL4MM&s=B9oYblo0mWxzn9VV3FGaJexvPlU9zTrsI41fDM4ORb8&e= 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.w3.org_WAI_EO_wiki_Style-23other-5Fwords-5Fand-5Fphrases&d=DwICaQ&c=0YLnzTkWOdJlub_y7qAx8Q&r=v-L6X-ScaY5UKb-F-_zcuXdbPw2UYK_gaTG8R5d9h7U&m=lT0RkzKPWViwYA0MZWTXqn8uDXjrBrrbQtLkZhkL4MM&s=B9oYblo0mWxzn9VV3FGaJexvPlU9zTrsI41fDM4ORb8&e=> 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.w3.org_WAI_EO_wiki_Style-23other-5Fwords-5Fand-5Fphrases&d=DwICaQ&c=0YLnzTkWOdJlub_y7qAx8Q&r=v-L6X-ScaY5UKb-F-_zcuXdbPw2UYK_gaTG8R5d9h7U&m=lT0RkzKPWViwYA0MZWTXqn8uDXjrBrrbQtLkZhkL4MM&s=B9oYblo0mWxzn9VV3FGaJexvPlU9zTrsI41fDM4ORb8&e= 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__www.w3.org_WAI_EO_wiki_Style-23other-5Fwords-5Fand-5Fphrases&d=DwICaQ&c=0YLnzTkWOdJlub_y7qAx8Q&r=v-L6X-ScaY5UKb-F-_zcuXdbPw2UYK_gaTG8R5d9h7U&m=lT0RkzKPWViwYA0MZWTXqn8uDXjrBrrbQtLkZhkL4MM&s=B9oYblo0mWxzn9VV3FGaJexvPlU9zTrsI41fDM4ORb8&e=>> 
> ____
> >
> >             Shawn Henry <mailto:shawn@w3.org <mailto:shawn@w3.org>>____
> >
> >             Tuesday 2 February 2021 14:43____
> >
> >             Hi folks,
> >
> >             Here is a draft update to the WAI Style Guide[1]:
> >
> >             "speech recognition" is for speech-to-text (SST), and 
> usually what we're talking about for accessibility.
> >             "voice recognition" is different; it's about identifying 
> the speaker, not what they're saying.
> >
> >             (For background, search the Web for "speech recognition 
> voice recognition difference")
> >
> >             Please let me know if you disagree or have edit suggestions.
> >
> >             Thanks!
> >             ~Shawn
> >
> >             [1] currently at the end of 
> https://www.w3.org/WAI/EO/wiki/Style#other_words_and_phrases 
> <https://www.w3.org/WAI/EO/wiki/Style#other_words_and_phrases> 
> <https://www.w3.org/WAI/EO/wiki/Style#other_words_and_phrases 
> <https://www.w3.org/WAI/EO/wiki/Style#other_words_and_phrases>> ____
> >
> >         __ __
> >
> >
> >         --
> >         Emerging Web Technology Specialist/Accessibility (WAI/W3C) ____
> >
> >     __ __
> >
> >     --
> > ___________________________________________________
> >
> >     Kimberly Patch
> >     (617) 325-3966
> >     kim@scriven.com <mailto:kim@scriven.com <mailto:kim@scriven.com>>
> >
> > www.redstartsystems.com <http://www.redstartsystems.com> 
> <http://www.redstartsystems.com <http://www.redstartsystems.com>>
> >     - making speech fly
> >
> >     PatchonTech.com <http://www.linkedin.com/in/kimpatch 
> <http://www.linkedin.com/in/kimpatch>>
> >     @PatchonTech
> > www.linkedin.com/in/kimpatch <http://www.linkedin.com/in/kimpatch> 
> <http://www.linkedin.com/in/kimpatch 
> <http://www.linkedin.com/in/kimpatch>>
> > ___________________________________________________ ____
> >
>

-- 
___________________________________________________

Kimberly Patch
(617) 325-3966
kim@scriven.com <mailto:kim@scriven.com>

www.redstartsystems.com <http://www.redstartsystems.com>
- making speech fly

PatchonTech.com <http://www.linkedin.com/in/kimpatch>
@PatchonTech
www.linkedin.com/in/kimpatch <http://www.linkedin.com/in/kimpatch>
___________________________________________________
Received on Wednesday, 3 February 2021 16:23:33 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 3 February 2021 16:23:34 UTC