Call transcription issue

On Wed, Jan 10, 2024 at 3:05 PM CCG Minutes Bot <minutes@w3c-ccg.org> wrote:

> Thanks to Our Robot Overlords for scribing this week!
>
> The transcript for the call is now available here:
>
> https://w3c-ccg.github.io/meetings/2024-01-09/
>
>
Summary: There's been a technical/API issue since early 2022 that has
likely omitted auto transcription from anyone speaking who's browser/client
was not identified as "en-US".

Hi, I don't usually read the transcriptions, but I noticed this week it was
rather short.  The main part of this call was not transcribed by our robot
overlords.  The video/audio files are fine.  I poked at the logs and
it appears that since Jan 2022(!), for all the various CCG calls, the
transcription API throws errors for every API call with data that is not
tagged as "en-US".  In the latest case, it was mostly "en-GB" throwing
errors.  The (very large) log file has lots of error data, so for the
curious, here's how many API calls failed, at a minimum, since 2022 with
data sizes from 500ms-30000ms+:

 288643 en-GB
  24120 de-DE
   6740 it-IT
   5732 ja-JP
   5592 nl-NL
   2839 fi-FI
    686 fr-CA
    520 fr-FR
    505 es-ES
     52 ko-KR

Does this match up with any transcription text people noticed missing?

Tracking down the error message of "The video model is currently not
supported for language : xx-YY", it appears only en-US is supported for the
video model:
https://cloud.google.com/speech-to-text/docs/speech-to-text-supported-languages

I'm not sure what the solution here is.  Perhaps since calls are primarily
in english, it could be forced to use "en-US" regardless of what browsers
may identify as?  I'm not familiar enough with the setup, configs, and APIs
to know how to do that, yet.  I figured it would be good to at least let
everyone know of this issue so missing transcribed text is less of a
mystery.

-dave

Received on Thursday, 11 January 2024 04:06:44 UTC