Re: Teleconference System Failure and Mitigation (post mortem)

On Sat, Sep 19, 2020, at 10:29 AM, John, Anil wrote:
> >When the presenter started broadcasting video, the default is to broadcast in the highest definition possible
> >which means the server started doing around 300Mbps in video streams to everyone and only had enough 
> >memory to cache around 5 seconds of video before running out of memory
> 
> As said presenter, I was happy to informally red team your defaults, 
> assumptions and expectations :-)
> 
> Best Regards,
> 
> Anil
> 
> Anil John
> Technical Director, Silicon Valley Innovation Program 
> Science and Technology Directorate 
> US Department of Homeland Security 
> Washington, DC, USA 
> 
> Email Response Time – 24 Hours
> 
> -----Original Message-----
> From: Manu Sporny <msporny@digitalbazaar.com> 
> Sent: Saturday, September 19, 2020 9:59 AM
> To: public-credentials@w3.org
> Subject: Teleconference System Failure and Mitigation (post mortem)
> 
> CAUTION: This email originated from outside of DHS. DO NOT click links 
> or open attachments unless you recognize and/or trust the sender. 
> Contact your component SOC with questions or concerns.
> 
> 
> On 9/18/20 7:56 PM, kimdhamilton@gmail.com wrote:
> > TL;DR: NEW AND IMPROVED JITSI, FEATURING MORE RAM.
> 
> For those of you that were on last weeks call, we experienced a hard 
> lock-up on the W3C CCG teleconferencing system. Here's what we believe
> happened:
> 
> 1. We had sized the server to optimize for costs, which meant
>    we only allocated 4GBs of RAM.
> 2. We tested this configuration with around 25 people connected
>    simultaneously, all broadcasting video. The server was
>    stable at ~60% memory usage. We thought we were good.
> 3. The call last week had 37 people at its peak. We were good
>    for the first 30 minutes or so.
> 4. When the presenter started broadcasting video, the default
>    is to broadcast in the highest definition possible, which
>    means the server started doing around 300Mbps in video
>    streams to everyone and only had enough memory to cache
>    around 5 seconds of video before running out of memory. We
>    went outside of that envelope, the server locked up hard,
>    and that was that. We switched over to Zoom and continued
>    the meeting with about 5 minutes of downtime.
> 
> We have done the following in an attempt to prevent this from happening
> again:
> 
> 1. Doubled the amount of RAM on the server to 8GB.
> 2. Added a 16GB swap volume in case we exceed the RAM
>    allocated to the machine.
> 
> We believe this should address the issue experienced during the last call.
> 
> Running in production is the ultimate test on your system design assumptions. :)
> 
> -- manu
> 
> --
> Manu Sporny - 
> https://urldefense.us/v2/url?u=https-3A__www.linkedin.com_in_manusporny_&d=DwICaQ&c=2plI3hXH8ww3j2g8pV19QHIf4SmK_I-Eol_p9P0CttE&r=FUgYmx6LTIaPqn7QR6TBfzml-fqCTpab-djgqlCFtgU&m=HBdDEjrS5CAHaTYUE2BvfrdkKFX6koyF-CId_SXApKI&s=vYOHxA8gBJa371Zit59Ub-lVIWk-yTlVQFz9gg4M2Jo&e=
> Founder/CEO - Digital Bazaar, Inc.
> blog: Veres One Decentralized Identifier Blockchain Launches 
> https://urldefense.us/v2/url?u=https-3A__tinyurl.com_veres-2Done-2Dlaunches&d=DwICaQ&c=2plI3hXH8ww3j2g8pV19QHIf4SmK_I-Eol_p9P0CttE&r=FUgYmx6LTIaPqn7QR6TBfzml-fqCTpab-djgqlCFtgU&m=HBdDEjrS5CAHaTYUE2BvfrdkKFX6koyF-CId_SXApKI&s=tGgIHOKJJqDl7xVVIZsJiKg8_yH0TiPWsi2KoLzm0NY&e=
> 
>

Received on Sunday, 20 September 2020 02:32:36 UTC