Re: [MAINTENANCE] Updates to minutes/archival system from Melvin Carvalho on 2026-01-05 (public-credentials@w3.org from January 2026)

From: Melvin Carvalho <melvincarvalho@gmail.com>
Date: Mon, 5 Jan 2026 17:15:17 +0100
To: Manu Sporny <msporny@digitalbazaar.com>
Cc: W3C Credentials CG <public-credentials@w3.org>
Message-ID: <CAKaEYh+EA7kq+Twz=F_vh-T=ty0=8=b_9BsihAcA7uC+7eFytQ@mail.gmail.com>

po 5. 1. 2026 v 16:41 odesílatel Manu Sporny <msporny@digitalbazaar.com>
napsal:

> Hey folks,
>
> The purpose of this email is to document how our latest
> transcription/archival system works. This is an attempt to reduce the
> amount of tribal knowledge needed to operate the infrastructure for
> our community.
>
> Some of you might have noticed that our meeting transcription summary
> emails got stuck during the last half of December 2025. This was due
> to Google deprecating the gemini-2.0-flash-lite model causing the
> summary emails to fail. The summarizer has been updated to
> gemini-2.5-flash, so everything should be working again. The archival
> process didn't fail; meetings continued to be archived. The step that
> generates the transcript summary and sends it to the mailing list is
> the thing that failed. I expect the AI parts of the infrastructure to
> keep breaking due to the "move fast and break things" nature of the
> companies deploying LLMs. That was the fourth such breaking change
> made just last year to the APIs -- we'll continue to fix things as
> they get broken; the instability pain is worth having to not use human
> scribes.
>
> While fixing that, I also took some time to reduce some "bus factor"
> in the infrastructure. The archival process has been running on my
> personal machine for the last year (so I could debug issues as they
> arose, and because the way these Google/LLM APIs work were a total
> pain to put into Github Actions). All that said, we now have a Github
> Action that will run every weekday at 6:30pm ET to perform any meeting
> archival for the day. The process, with some successful runs, can be
> found here:
>
> https://github.com/w3c-ccg/w3c-ccg-archiver/actions/workflows/archive.yaml
>
> That uses the general CG Archival tool that can be found here:
>
> https://github.com/w3c-ccg/cg-archiver/
>
> We need better documentation on the whole setup, but it's all
> automated and running in Github Actions now. In theory, someone else
> could pick it up and improve it from here.
>
> No action is required by anyone at this point. Just providing an
> update in case others wanted to improve the current set up.
>

Thanks alot for sharing Manu

I'd like to start using things like this in other groups. Do you have
thoughts on what is currently the best transcription service?

I had some quite good experience with : https://elevenlabs.io/audio-to-text

But it was not the best at figuring out who was talking. Would love to hear
experiences on this topic, as almost every group at the w3c needs it.


>
> -- manu
>
> --
> Manu Sporny - https://www.linkedin.com/in/manusporny/
> Founder/CEO - Digital Bazaar, Inc.
> https://www.digitalbazaar.com/
>
>

Received on Monday, 5 January 2026 16:15:34 UTC