Re: Synchronization Accessibility User Requirements: Speech recognition accuracy from Noble, Stephen on 2021-08-31 (public-rqtf@w3.org from August 2021)

From: Noble, Stephen <steve.noble@pearson.com>
Date: Tue, 31 Aug 2021 22:09:16 +0000
To: "White, Jason J" <jjwhite@ets.org>, "public-rqtf@w3.org" <public-rqtf@w3.org>
Message-ID: <BN7PR07MB4868406AA7CCDC0B3E42363FF4CC9@BN7PR07MB4868.namprd07.prod.outlook.com>
I just saw this interesting and perhaps pertinent communication from the US Federal Communications Commission:
IP CTS ASR-Only Authorization Granted to CaptionCall<https://www.fcc.gov/document/ip-cts-asr-only-authorization-granted-captioncall>

In particular, the files that are linked at the bottom of the page include some information that may be relevant to our discussion, and we should probably try to digest some of this information and see if any of it is relevant to the SAUR. Here is an interesting snippet from the background file. Unfortunately, some of this gets jumbled up because of all the footnotes, so you may want to read from the Word document they include.
<start clip>
9. Captioning Speed / Delay.
CaptionCall sufficiently supported its claim that its ASR technology will transcribe captions in real time and in compliance with the minimum TRS standards relating to captioning speed and delay. Currently, there is no quantitative standard for IP CTS caption delay per se.  However, captions must be delivered “fast enough so that they keep up with the speed of the other party’s speech,” and “if captions are not keeping up with the speech (although a short delay is inevitable), at some point the provider is no longer offering relay service and the call is not compensable.”  Telecommunications Relay Services and Speech-to-Speech Services for Individuals with Hearing and Speech Disabilities; Internet-based Captioned Telephone Service, CG Docket No. 03-123, Declaratory Ruling, 22 FCC Rcd 379, 388-89, para. 22 & n.69 (2007) (2007 IP CTS Declaratory Ruling).  In addition, the typing speed standard for text-based TRS is applicable.  See id. at 388, para. 22 n.69; 47 CFR § 64.604(a)(1)(iii) (requiring TRS CAs to have a minimum typing speed of 60 words per minute).  Based on the test results and other evidence discussed above, CaptionCall has shown that its fully automatic IP CTS not only will meet this standard but also will “keep up with the speed of the other party’s speech.”  2007 IP CTS Declaratory Ruling, 22 FCC Rcd at 388, para. 22.  On October 2, 2020, the Commission released a Further Notice of Proposed Rulemaking that proposed quantitative standards for IP CTS caption delay.  Misuse of Internet Protocol (IP) Captioned Telephone Service; Telecommunications Relay Services for Individuals with Hearing and Speech Disabilities; Structure and Practices of the Video Relay Service Program, CG Docket Nos. 13-24, 03-123, and 10-51, Report and Order, Order on Reconsideration, and Further Notice of Proposed Rulemaking, 35 FCC Rcd 10866, 10896-903, paras. 62-81 (2020) (IP CTS Metrics Further Notice).
  CaptionCall reports that testing with simulations of audio files used for stenographer tests showed that its ASR-only captioning meets the typing speed standard of 60 words per minute. CaptionCall Application at 12; see also 47 CFR § 64.604(a)(1)(iii).
  CaptionCall further states that internal testing found the ASR-only system averages a captioning delay “of less than 2 seconds from the time the phrase ends.” CaptionCall Application at 12 (emphasis added).
  Additionally, in performance testing of CA-assisted and ASR-only IP CTS technologies by the Commission’s National Test Lab, See FCC Telecommunications Relay Services Project, Captioning Device Performance Testing: [Caption Call] Automated Speech Recognition (ASR) Assessment, CG Docket No. 03-123, at 2 (posted by CGB on April 19, 2021) (NTL Test Report).  The National Test Lab is operated by MITRE Corporation (MITRE) as part of the CMS Alliance to Modernize Healthcare Services, a Federally Funded Research and Development Center sponsored by the Centers for Medicare & Medicaid Services (CMS).  See CMS Alliance to Modernize Healthcare, Internet Protocol Caption Telephone Service (IP CTS) Devices: Summary of Phase I Activities, at 1 (July 24, 2017), CG Docket Nos. 13-24 and 03-123 (filed by CGB Apr. 11, 2018).
 CaptionCall’s median per-word caption delays for various call scenarios ranged from 3.3 to 3.8 seconds, while CA-assisted providers’ median caption delays were significantly longer, averaging from 5.2 to 17.8 seconds. NTL Test Report at 4.
  These test results show that CaptionCall’s ASR-only captioning will satisfy the Commission’s minimum standards for captioning speed and delay.

10. Accuracy and Readability.
Although the TRS rules do not currently provide metrics for accuracy and readability, the typing, grammar, and spelling of captions must be “competent,” and conversations must be transcribed “verbatim,” with no intentional alteration of content unless the user specifically requests summarization. 47 CFR § 64.604(a)(1)(ii), (2)(ii).  These standards apply to captions developed with ASR.  See Telecommunications Relay Services and Speech-to-Speech Services for Individual with Hearing and Speech Disabilities, CC Docket No. 98-67, Declaratory Ruling, 18 FCC Rcd 16121, 16134-35, paras. 37-39 (2003); 2018 ASR Declaratory Ruling, 33 FCC Rcd at 5832, para. 60.  The Commission recently proposed a quantitative standard for accuracy based on measuring word error rate.  IP CTS Metrics Further Notice, 35 FCC Rcd at 10900-02, paras. 71-78.
  We find sufficient record evidence that CaptionCall’s fully automatic IP CTS will meet or exceed the Commission’s competence and “verbatim” requirements.  CaptionCall states that it evaluated the performance of leading ASR vendors and selected an ASR platform based on accuracy, transcription formatting, including punctuation and capitalization, and user-friendly handling of acronyms, prices, dates, and numerics. CaptionCall Application at 9, 13.
  CaptionCall reports that it conducted internal testing of its engine’s captioning accuracy using the audio files developed by the National Test Lab.  The Word Error Rates reported from CaptionCall’s internal testing are comparable to results of National Test Lab testing of other conditionally certified ASR-only providers and compare favorably with the average results for CA-assisted providers, described below. Id. at 10; CaptionCall Reply at 2.  The numerical results were provided in the confidential version of the CaptionCall Application.

11. The National Test Lab’s own testing provides further evidence that, in terms of accuracy, CaptionCall’s ASR platform can outperform CA-assisted IP CTS.  In repeated tests of five call scenarios, CaptionCall’s ASR platform achieved median Word Error Rates ranging from 2.9 to 14.5, while CA-assisted providers’ median Word Error Rates ranged from 8.9 to 19.5 on average.
  These test results sufficiently support our determination, for the purpose of conditional certification, that CaptionCall’s ASR-only captioning will meet or exceed the minimum TRS standards for competence and verbatim transcription, as well as CaptionCall’s claim that “[t]here is no reason to believe that CaptionCall’s implementation of ASR-only as described [in its application] will lead to service degradation.” We believe the MITRE test results, published after the close of the comment period on this application, supply sufficient public information on the latency and accuracy results to address the Consumer Groups’ concern about the transparency of testing for both caption delay and accuracy.
<end clip>

We might could include this as a reference if we think this is useful.

--Steve



Steve Noble
Instructional Designer, Accessibility
Psychometrics & Testing Services

Pearson

502 969 3088
steve.noble@pearson.com<mailto:steve.noble@pearson.com>

[https://ci3.googleusercontent.com/proxy/xFjftXlwMzpdFeTtDgc4_IwyMYm8ThtQHIsgElkS8fyiCO2M7ZM0WaO7r2uy-bmKAe5S2sIcg7d-mwbD4ArkJhyafHke-SgJ2ui8DoGoBhZw4YIyWeK3LUozNMwBff4JR2tdu8nZ2fvoNvkkA06KNw9-s3P9UvYsHSTphHss6X0=s0-d-e1-ft#http://accessibility4school.pearson.com/access/4c49fe02-e204-46b4-b6f0-82f5a3f159cb/pearson-accessibility.jpg]


________________________________
From: White, Jason J <jjwhite@ets.org>
Sent: Monday, August 30, 2021 12:38 PM
To: public-rqtf@w3.org <public-rqtf@w3.org>
Subject: Synchronization Accessibility User Requirements: Speech recognition accuracy


Dear colleagues,

Please note in advance of this week’s meeting the recent update to Issue #229:
https://github.com/w3c/apa/issues/229<https://urldefense.com/v3/__https://github.com/w3c/apa/issues/229__;!!LtJ5xwj-!bXktOvEQkyOABF3yRg-6P3v71FsV3ZteYZcII2qKuy7vjWAJDj8LybG-wWxRj8r5uw$>
and the corresponding changes that I made to the draft, which Josh has merged at
https://raw.githack.com/w3c/apa/main/saur/<https://urldefense.com/v3/__https://raw.githack.com/w3c/apa/main/saur/__;!!LtJ5xwj-!bXktOvEQkyOABF3yRg-6P3v71FsV3ZteYZcII2qKuy7vjWAJDj8LybG-wWyiGwiNug$>



________________________________

This e-mail and any files transmitted with it may contain privileged or confidential information. It is solely for use by the individual for whom it is intended, even if addressed incorrectly. If you received this e-mail in error, please notify the sender; do not disclose, copy, distribute, or take any action in reliance on the contents of this information; and delete it from your system. Any other use of this e-mail is prohibited.


Thank you for your compliance.

________________________________
Received on Tuesday, 31 August 2021 22:09:47 UTC