W3C home > Mailing lists > Public > public-webrtc-logs@w3.org > December 2019

Re: [mediacapture-main] Support capturing audio output from sound card (#629)

From: guest271314 via GitHub <sysbot+gh@w3.org>
Date: Thu, 19 Dec 2019 19:10:05 +0000
To: public-webrtc-logs@w3.org
Message-ID: <issue_comment.created-567622453-1576782603-sysbot+gh@w3.org>
@jan-ivar There is no bright-line rule that the input to `SpeechRecognition` must be from a live human voice. https://stackoverflow.com/a/47113924.

Consider an individual with vision impairment. They have a book they want to read or write. They can feed the text of the book to `SpeechRecognition` via `speechSynthesis`. Before feeding the text to `speechSynthesis` the can modify the plain text or Braile input, capture the audio output and test the result of `SpeechRecognition` prior to publishing their work.

In reverse, audio output can be converted to plain text (in one or more languages) or Brail, etc. 

Without the ability to capture audio output it becomes difficult to test input and output. 

>  I think those are compelling use cases for web speech to solve cleanly.

Well, you can refer to the document that you cited https://github.com/mozilla/standards-positions/issues/170#issuecomment-520157837, in this case the author of the post is correct in their analysis

> I’m not sure how versed you are at reading specs, but if you take a look at the actual spec you will see that there are parts of the API that are either impossible to implement in an interoperable manner or the spec doesn’t say what to do: to be blunt, the spec hardly qualify as a spec at all... its more of a wish list thinly disguised as technical speciation only because it uses a W3C stylesheet: There are no algorithms. There is basically zero specified error handling. The eventing model is a total mystery. And much of it is just hand waving that magical things will happen and speech will somehow be generated/recognized (see the grammars section of the spec for a good hardy chuckle).

Essentially, the Web Speech API is dead. While a novel and worthy start, there are several issues with the current specification. 

Media Capture and Streams is well-suited to take on the task of filling in the holes, which are actually in accord with what is already possible.

Am still not gathering the reluctance to acknowledge that the procedure is already technically possible. 

GitHub Notification of comment by guest271314
Please view or discuss this issue at https://github.com/w3c/mediacapture-main/issues/629#issuecomment-567622453 using your GitHub account
Received on Thursday, 19 December 2019 19:10:07 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:22:35 UTC