W3C home > Mailing lists > Public > public-webrtc@w3.org > November 2021

[minutes] November 24 meeting

From: Dominique Hazael-Massieux <dom@w3.org>
Date: Fri, 26 Nov 2021 15:37:50 +0100
To: "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <7deeab72-e641-f735-cf02-f227accba91b@w3.org>

The minutes of our meeting on Nov 24 are available at:
and copied as text below.

Thank you Florent for scribing!


                  WebRTC November 2021 Virtual interim

24 November 2021

   [2]Agenda. [3]IRC log.

      [3] https://www.w3.org/2021/11/24-webrtc-irc


          Anssi, BenW, Bernard, Carine, Dom, Eero, Elad, Florent,
          Guido, Harald, Jan-Ivar, PatrickRockhill, Riju,
          TimPanton, TonyHerre, Tuukka, Youenn


          Bernard, Harald, Jan-Ivar

          Dom, Florent


    1. [4]Media Capture Transform
    2. [5]Region Capture
    3. [6]WebRTC NV Use Cases
    4. [7]Face Detection API

Meeting minutes

   Slideset: [8]https://lists.w3.org/Archives/Public/www-archive/


  Media Capture Transform

   [Jan-Ivar gives updates on discussion with WHATWG re Streams]

   Harald: we have issues that need solving, and we're fairly
   confident we'll be able to solve them.

   [Harald presents how to evolve Jib's and his proposal]

   Youenn: Is audio part of the presentation?

   Harald: Audio is out, we have no consensus

   Youenn: want stronger guarantees that we'll be able to solve
   the issues

   Dom: We need to be convinced we have a solution for those

   TimPanton: What happens if you don't close it (a stream)?

   Harald: They will hang around until garbage collection.

   TimPanton: We need to be clear about the expectations to avoid
   surprise browser slowdowns.

   Jib: We could add notes in the document about life cycle

   Youenn: As long as we do not have a good API, we cannot make
   progress. I feel more positive now.

   Jib: I feel confident, we have 2 solutions for this problem.

   Bernard: Looking at samples, we have not done all the required
   cleanup, especially related to error conditions.

   Bernard: AI: First step to make the changes for the draft,
   announce it on the list and make a call for adoption when it’s

  Region Capture

   Elad presents the rationale for the Region Capture API and
   shows code examples.

   Youenn: You can use message channels to transfer, it shouldn't
   be a problem. I think you can use decorations instead of crop

   Elad: we want it to work with GDM, GVM and extensions. It
   should work with multiple tabs from different domains. It seems
   this is a faster way to ship.

   Dom: What happens if the targeted element is no longer visible?

   Elad: There's a spec draft with more details on my Github. It
   suggests that the track is muted when the element is no longer
   visible and unmuted when it’s again visible.

   Youenn: Adding features like cropping to GDM tab capture can be
   risky, we want to move away from it. If we add to GDM people
   might stick to it while GVM is safer. The more we provide
   benefits in GVM and the less there is in GDM, the safer the web

   Jib: We should avoid the word "id" or exposing an ID in an API
   as it might invite a lot of scrutiny. Suggests adding an
   interface instead. It helps with garbage collection.

   Jib: Using applyConstraints() instead, which could have with
   the timing and presenting an uncropped frame might be better.

   Elad: Constraints might be more difficult implementation wise
   and have weaker guarantees on when they apply.

   Jib: Constraints may be underspecified and we could improve
   applyConstraints() as well.

   TimPanton: I think we should do this quickly. I like the idea
   of an opaque token better than transferring a stream. Not fond
   of constraints and in favor of the API.

   Elad: What is a problem with string IDs?

   TimPanton: It's about avoiding a conversation regarding
   potential risks. Advises making an interface and adding strings

   Elad: Open to that idea.

   Dom: UUIDs can indicate returning users. It's better to avoid
   the discussion with the Privacy Interest Group.

   Youenn: Supports token but not constraints.

   Harald: Interface can be serialisable, and if so, they can be
   strings in disguise. UUIDs are fine if they are short lived,
   but there is a privacy risk.

   [Discussion about if we want a call for adoption for the
   document or if a call for review is possible.]

   Jib: Suggests we can have the CFA after the document is updated
   to use an opaque token.

   Dom: Will Elad offer to be an editor?

   Elad: Yes

   Dom: AI: Need to confirm this with the chairs. Updating the
   document with the token and then CFA.

  WebRTC NV Use Cases

   [ [9]Slide 30 ]


   Bernard: lots of systemic changes since our FPWD of NV use
   cases, in particular due to the pandemic and technological

   [ [10]Slide 31 ]


   Bernard: NV use cases include 3 use cases from the original use
   cases that could be improved
   … not much API support for these - essentially webrtc-ice and
   … lots of references to ICE
   … the first 2 related use cases aren't necessarily the most
   … not clear that ICE is such a key enabler either
   … the video conferencing use case might be approached
   differently, with webcodecs rather than by extending WebRTC

   [ [11]Slide 32 ]


   Bernard: in terms of new use cases
   … only API that applies is rtcdatachannel in workers
   … no progress on other, despite the use cases themselves having
   found broad adoption

   [ [12]Slide 33 ]


   Bernard: for use cases 3.6-8, quite a bit of activity in the WG
   - mediacapture transform, machine learning
   … not so much for 3.9
   … still discussions around intersection with ML
   … no specifics on face/body tracking API
   … nothing on data exchange in service workers

   [ [13]Slide 34 ]


   Bernard: does the doc reflect current industry priorities?
   current state of tech (incl webcodecs)?
   … none of the use cases have all their requirements met by API
   … only 4 have at least one proposal
   … substantial gaps in requirements around data transport
   … The document doesn't talk about the long term architecture
   … current doc seems to build on the view of "extending" webrtc,
   but this may need to evolve based e.g. on the webcodecs view of
   the world

   TimP: in terms of environment change, a lot more is happening
   on mobile than used to be and what we would have expected

   Bernard: true; incl for game streaming

   TimP: also for e.g. small social / family gathering
   … also a question worth addressing is the P2P architecture
   … in reality, most of the WebRTC usage isn't P2P
   … WebTransport is not designed for P2P
   … maybe WebRTC is P2P and WT is the centralized architecture?

   Youenn: some of the new technologies like WebTransport are
   providing more flexibility
   … where WebRTC is more of an integrated system
   … the need for metadata synchronization (e.g. in metaverse)
   seems very relevant, needs more detailed anchoring in our APIs
   … RTP headers extension might be exposed for non-browser
   … also +1 to TimP's point about mobile browsers - getting more
   interop across iOS / android would be good
   … e.g. handling of muting in case of priority audio capture in
   mobile (e.g. in mobile phone)
   … in general, providing more consistency across os/browsers
   would be good

   harald: there is a lot of stuff that uses webrtc outside the
   browser - e.g. recently in a ring doorbell
   … being able to contact these (pseudo-)webrtc endpoints from
   browsers is important
   … our use case driven approach hasn't worked to well to track
   what's going on
   … in terms of long term architecture, it's hard to manage -
   trade-off between consistency and fitness

   anssi: ML WG chair here - we're very interested in making sure
   our WebNN API helps with the webrtc use cases

   <anssik> [14]https://github.com/webmachinelearning/webnn/

     [14] https://github.com/webmachinelearning/webnn/issues/226

   [15]Integration with real-time video processing #226

     [15] https://github.com/webmachinelearning/webnn/issues/226

   anssi: we've started developing a prototype based on background

   Jan-Ivar: apart from funny hats, most of the use cases focus on
   … we've seen lots of use cases around media capture (e.g.
   screen sharing)
   … Mozilla takes use cases pretty seriously - some use cases are
   marked as not having consensus, would prefer we call for
   consensus or remove them

  Face Detection API

   [ [16]Slide 36 ]


   Riju: I hope to present proposals that help address 4 of the
   use cases that were presented
   … today focusing on face detection, following the related
   breakout at TPAC last month

   [17]WebRTC Intelligent Collaboration TPAC 2021 breakout

     [17] https://www.w3.org/2021/10/20-webrtc-ic-minutes.html

   Riju: we have an updated proposal for what an API might look

   [18]Face detection proposal


   [ [19]Slide 37 ]


   Riju: developers could request a specific number of points for
   the contour
   … a face mesh is unlikely to be available in the short term,
   but documented it in the API for sake of completeness
   … the proposal includes a set of expressions that can be
   obtained from drivers without DNN
   … again, we can decide to remove items

   Harald: contours is an improvement to square or rectangles -
   particularly needed for e.g. background blur
   … I worry about relying on what's available from drivers
   … instead, I would like us to approach this based on what's
   available in the frames of an MST, no matter how it was added
   … likewise, I owuld like to be able to add that data to frames
   when I'm a producer
   … the API should allow consumption, production and even
   refinement of annotations attached to a track
   … e.g. a transform could improve the rough contour identified
   by the driver to bring an improved annotation downstream
   … the shape of the API has the right amount of metadata, but it
   shouldn't be described as a one-way consumption API

   Riju: I'll try to show a proposal for background blur early
   … what we're trying to provide here is processing
   free-of-computation, because it already happens in the driver

   TimP: +1 to Harald in terms of enabling successive refinements,
   on top of the compute-free results
   … I'm also nervous about the expression enums
   … others are reasonable factual, while expressions is more

   riju: is the concern about restricting the list? blink and
   smile are available across platforms

   TimP: my concerns is that the detection could be wrong, in
   particular for some subgroups
   … given the level of subjectiveness

   Riju: note taken

   Jan-Ivar: what was the previous agreement on this topic? what
   is the question you're bringing to the WG?

   Riju: last time I presented a proposal on top of image capture,
   it was suggested to bring to mediacapture-extensions
   … also there was a request of making it more generic
   … the goal would be to bring it to the mediacapture extension

   jan-ivar: there would need to be a process on whether to adopt
   or not this API
   … mediacapture-extensions sounds like a good place for a future
   … I'm not entirely sure how to deal with this for now

   Bernard: this is interesting; I have concerns with emotion
   analysis in terms of accuracy
   … in terms of how this would be used - it's a method on
   MediaStreamTrack, but it would have to be designed to work with
   Media Capture Transform
   … I woudl see it as TransformStream to be used e.g. for
   background blur
   … the information would be used to execute the blur faster
   … the provided information (e.g. the contour) is meant to help
   processing something the GPU buffer
   … when the information itself might be CPU-side

   riju: the performance on chromeos/window based on CPU doesn't
   depend on GPU memory
   … I think it's giving good results

   bernard: in terms of API shape, having this on mediastreamtrack
   feels wrong - I want it to work on a videoframe

   Youenn: similar feedback to Bernard - exposing driver data is a
   decision at the mediastreamtrack level, but the data should not
   be on MST - it should be synchronized with videoframes
   … either by getting it from the frame, or getting it at the
   same time as a video frame
   … that's the kind of model that would make sense to me
   … Given that the idea is to expose driver info, it's good to be
   as specific as possible
   … we should separate driver-specific metatadata from more
   general approaches

   Bernard: next steps?

   Riju: could show some demos with performance numbers

   dom: heard consistent feedback to anchor this in VideoFrame
   … defined in WebCodecs

   Youenn: let's have the discussion in mediacapture-extensions
   and identify an architecture there

   [20]Face Detection. #289

     [20] https://github.com/w3c/mediacapture-image/issues/289

   [now at [21]https://github.com/w3c/mediacapture-extensions/
   issues/44 ]

     [21] https://github.com/w3c/mediacapture-extensions/issues/44
Received on Friday, 26 November 2021 14:37:54 UTC

This archive was generated by hypermail 2.4.0 : Friday, 26 November 2021 14:37:56 UTC