- From: Evan Liu <notifications@github.com>
- Date: Thu, 09 Jan 2025 14:49:47 -0800
- To: w3ctag/design-reviews <design-reviews@noreply.github.com>
- Cc: Subscribed <subscribed@noreply.github.com>
- Message-ID: <w3ctag/design-reviews/issues/1038@github.com>
こんにちは TAG-さん! I'm requesting a TAG review of on-device support for the Web Speech API. This feature adds on-device speech recognition support to the Web Speech API, allowing websites to ensure that neither audio nor transcribed speech are sent to a third-party service for processing. Websites can query the availability of on-device speech recognition for specific languages, prompt users to install the necessary resources for on-device speech recognition, and choose between on-device or cloud-based speech recognition as needed. - Explainer¹: https://github.com/WebAudio/web-speech-api/pull/122 - Specification: https://webaudio.github.io/web-speech-api/ - WPT Tests: https://github.com/web-platform-tests/wpt/tree/master/speech-api - User research: N/A - Security and Privacy self-review²: Relevant survey questions: 2.1. What information might this feature expose to Web sites or other parties, and for what purposes is that exposure necessary? This feature would expose if on-device speech recognition is available in a specific language. This is required in order for websites to know if on-device speech recognition is available. 2.2. Do features in your specification expose the minimum amount of information necessary to enable their intended uses? Yes. Some websites may have strict privacy requirements that require on-device speech recognition so websites must know if it's possible to ensure that neither audio nor captions are sent to a third-party service for processing. 2.6. Do the features in your specification expose information about the underlying platform to origins? While this feature does not directly expose information about the underlying platform, websites may potentially use performance metrics for on-device speech recognition to gauge general hardware capability. 2.15. Does this specification have both "Security Considerations" and "Privacy Considerations" sections? Yes, the spec contains a section on how to reduce risk of fingerprinting. Websites needs explicit user permission to install on-device speech recognition language packs that do not match the user's preferred language or if the user is not on ethernet or Wi-Fi. - GitHub repo: https://github.com/WebAudio/web-speech-api - Primary contacts: evliu@google.com - Organization/project driving the specification: Google - Multi-stakeholder support³: - Chromium comments: https://chromestatus.com/feature/6090916291674112 - Mozilla comments: https://github.com/mozilla/standards-positions/issues/1157 - WebKit comments: https://github.com/WebKit/standards-positions/issues/443 Commonly requested feature. Examples: https://webwewant.fyi/wants/55/ https://github.com/WebAudio/web-speech-api/issues/108 https://stackoverflow.com/questions/49473369/offline-speech-recognition-in-browser https://www.reddit.com/r/html5/comments/8jtv3u/offline_voice_recognition_without_the_webspeech/ Further details: - [X] I have reviewed the TAG's [Web Platform Design Principles](https://www.w3.org/TR/design-principles/) - The group where the work on this specification is currently being done: Audio Community Group - The group where standardization of this work is intended to be done (if different from the current group): Audio Working Group - This work is being funded by: Google You should also know that... The primary risk of this new functionality is the potential for fingerprinting. To mitigate this risk, the Chrome Trust & Safety team proposes requiring explicit user consent to install language packs that do not match one of the user's preferred languages or if the user is not on a Ethernet/Wi-Fi network. The existing Web Speech API has an outdated callback design which must be maintained due to backwards compatibility/interoperability issues. While Firefox doesn't officially support the speech recognition section of the Web Speech API, it has a unprefixed implementation behind a flag and most of the guides on how to use the Web Speech API do something like window.SpeechRecognition || window.webkitSpeechRecognition; (Examples from [developer.mozilla.org](https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API/Using_the_Web_Speech_API), [codeburst.io](https://codeburst.io/html5-speech-recognition-api-670846a50e92), [dev.to](https://dev.to/nixx/building-a-real-time-speech-to-text-web-app-with-web-speech-api-4mc6)) and there are 17.8K instances of this kind of usage on [Github](https://github.com/search?q=%22window.webkitSpeechRecognition+%7C%7C+window.SpeechRecognition%22+OR+%22window.SpeechRecognition+%7C%7C+window.webkitSpeechRecognition%22&type=code) alone. The Audio Working Group is looking into potentially replacing this API with a new, modernized version under a different name. A separate TAG design review will be sent for that if the group decides to proceed with the new API. -- Reply to this email directly or view it on GitHub: https://github.com/w3ctag/design-reviews/issues/1038 You are receiving this because you are subscribed to this thread. Message ID: <w3ctag/design-reviews/issues/1038@github.com>
Received on Thursday, 9 January 2025 22:49:51 UTC