[minutes] April 2024 meeting

Hi,

The minutes of our April 2024 meeting held yesterday are available at:
   https://www.w3.org/2024/04/23-webrtc-minutes.html

and copied as text below.

Dom

                       WebRTC April 23 2024 meeting

23 April 2024

    [2]Agenda. [3]IRC log.

       [2] https://www.w3.org/2011/04/webrtc/wiki/April_23_2024
       [3] https://www.w3.org/2024/04/23-webrtc-irc

Attendees

    Present
           Bernard, Carine, Dom, Eero, Elad, Florent,
           FrederikSolenberg, Guido, Harald, Jan-Ivar, Riju,
           Sameer, SunShin, TimP, TonyHerre, Tove

    Regrets
           -

    Chair
           Bernard, HTA, Jan-Ivar

    Scribe
           dom

Contents

     1. [4]Custom Codecs
     2. [5]Captured Surface Switching
     3. [6]Racy devicechange event design has poor interoperability
        in Media Capture and Streams
     4. [7]WebRTC API
          1. [8]Convert RTCIceCandidatePair dictionary to an
             interface
          2. [9]setCodecPreferences should trigger
             negotiationneeded
          3. [10]receiver.getParameters().codecs seems
             under-specified
     5. [11]Background segmentation mask
     6. [12]Summary of resolutions

Meeting minutes

    Slideset: [13]https://lists.w3.org/Archives/Public/www-archive/
    2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf

      [13] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf

   [14]Custom Codecs

      [14] https://github.com/w3c/webrtc-encoded-transform/pull/186

    [15][Slide 10]

      [15] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=10

    [16][Slide 11]

      [16] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=11

    [17][Slide 12]

      [17] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=12

    [18][Slide 13]

      [18] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=13

    Harald: this requires the ability of setting the mime type of
    aframe, which can be done two ways: with a frame constructor
    (merged in [19]#233), or via setMetadata ([20]#202) which has
    stalled
    … setMetadata feels like a better fit from my perspective
    … but at least the constructor allows for this, and so we may
    not need two different ways

      [19] https://github.com/w3c/webrtc-encoded-transform/issues/233
      [20] https://github.com/w3c/webrtc-encoded-transform/issues/202

    jan-ivar: I'm supportive of the API shape; on the question of
    constructor vs setMetadata - it's a bit complicated
    … because these encoded frames are mutable, unlike webcodecs
    … that's a bit unfortunate but it makes sense in the context of
    encryption
    … in webcodecs, frames are unmutable, which would require a
    copy-constructor step

    Harald: with immutable data, we would have to have a copy
    constructor with a separate argument for the data itself

    Jan-Ivar: iow, I don't have a clear answer to your question

    bernard: also supportive of this; setMetadata should be fine
    here, we don't have the same constraints we had in WebCodecs
    … for WebCodecs, we didn't want data to change while an
    operation is in progress
    … here setMetadata should be safe
    … it would be nice to allow for this without making a copy
    … For some codecs like H264, it's not just the mime type, it's
    also a profile, packetization mode, etc
    … can you set this here as well?

    harald: yes, it includes all the parameters

    [TimP: supportive of this]

    Harald: based on the feedback, it sounds like moving forward
    with [21]#202 would be worth looking into again

      [21] https://github.com/w3c/webrtc-encoded-transform/issues/202

    Guido: setMetadata feels like a better fit for this use case
    (although I was supportive of the copy constructor for a
    separate one)

    Jan-Ivar: let's follow up on github

    [TimP: any issue with having several transforms in sequence?]

    Harald: if they're connected by pipelines, this creates good
    hand-off points from one to the next

    Jan-Ivar: given this, I think the copy constructor would be
    better fit
    … setMetadata can end up with @@@ issues
    … not clear that we should extend the problem we have with data
    to metadata

    Bernard: in WebCodecs, immutable data was a way to avoid race
    conditions with the work being done in a separate thread

    Jan-Ivar: this is handled via the transfer step here

    Bernard: setMetadata could only be called from the transform
    right? not after it has been enqueued?

    Jan-Ivar: setMetadata can only be called if the object is still
    there…
    … It feels to me like having setMetadata is redundant with the
    copy constructor

    Harald: right now, the copy constructor is expensive

    Jan-Ivar: let's continue the discussion on [22]#202

      [22] https://github.com/w3c/webrtc-encoded-transform/issues/202

    RESOLUTION: Consensus on on [23]#186, discussion to continue on
    [24]#202

      [23] https://github.com/w3c/webrtc-encoded-transform/issues/186
      [24] https://github.com/w3c/webrtc-encoded-transform/issues/202

   [25]Captured Surface Switching

      [25] 
https://github.com/w3c/mediacapture-screen-share-extensions/issues/4

    [26][Slide 17]

      [26] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=17

    [27][Slide 18]

      [27] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=18

    [28][Slide 19]

      [28] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=19

    [29][Slide 20]

      [29] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=20

    [30][Slide 21]

      [30] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=21

    [31][Slide 22]

      [31] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=22

    [32][Slide 23]

      [32] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=23

    [33][Slide 24]

      [33] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=24

    [34][Slide 25]

      [34] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=25

    Tove: is this a promising way forward?

    [TimP: Is simply supplying an event handler enough to
    discriminate ? Do we actually need the surface/session
    property?]

    Tove: we discussed this in the December meeting whether an
    event handler (back then, a callback) would be enough to
    discriminate
    … and there is a design principe that changing behavior whether
    an event handler is on

    Jan-Ivar: indeed; there are cases where that would be OK
    … we haven't talked about stopping tracks here
    … it might be OK for the user agent to optimize away user
    visible behavior when it comes to how quickly the indicators
    state/permission UX change

    Jan-Ivar: for backwards compatibility, I think we're in
    agreement the UA could optimize the case when no event handler
    has been added

    Tove: the original proposal was that you would always get the
    two kind of tracks which if you don't need it would still need
    to be managed
    … hence this new proposal that lets apps pick which tracks they
    want

    Jan-Ivar: If I opt-in to the surface track, what would
    getDisplayMedia return?

    Tove: I'm proposing getDisplayMedia returns the session track,
    and the event exposes the surface track
    … but I'm open to other approaches

    Elad: what if we had a getter for the session track, but only
    return the surface track from getDisplayMedia
    … that way you don't have to wait for an event, you could
    access to either at any point
    … stopping for unused surface tracks could be handled by the
    capturecontroller

    Jan-Ivar: I like the behavior and concepts of surface/session
    tracks
    … but asking developers to pick one upfront feels artificial
    … I could move from one tab to another tab with audio, but then
    stay in tab+audio mode moving forward
    … hence why I was proposing to expose both and let the app
    close the ones they don't want
    … I was initially worried this would lead to confusing
    indicators
    … but Youenn convinced me this could be optimized away

    Harald: if I want to write an app that handles switching of
    surfaces and have code that covers both cases, I would struggle
    to maintain two code paths to manage what gets presented to the
    end user

    Tove: the problem I see with Jan-Ivar's proposal is that we
    lose the guarantee that one track represents one surface which
    I think is an attractive invariant

    Jan-Ivar: I don't think Web developers need to care about that;
    there is an isolation principle that when switching from one
    surface to another, you're also switching sources
    … I like slide 19 - the only thing missing is stopping tracks
    … if a developer doesn't care about surface track at all, don't
    register an event handler
    … you would want to stop old tracks in the event handler
    … this would also let the developer choose live which tracks
    they can support

    Elad: what happens if the app doesn't stop either track?

    Jan-Ivar: the backwards compatible design is injection; would
    we be talking about ending that model?

    RESOLUTION: more discussion is needed on the lifecyle of
    surface tracks

   [35]Racy devicechange event design has poor interoperability in Media
   Capture and Streams

      [35] https://github.com/w3c/mediacapture-main/issues/972

    [36][Slide 28]

      [36] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=28

    [37][Slide 29]

      [37] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=29

    Jan-Ivar: this is modeled on the RTC track event

    Jan-Ivar: any objection to merging this PR?

    Guido: what does "current result from enumerateDevices" mean?

    Jan-Ivar: good point, I should rephrase that - it's the devices
    at the time of the event is fired
    … this would be a synchronous equivalent to what
    enumerateDevices would produce

    Guido: I agree with the change, but the language should be
    clarified

    Dom: is there an existing internal slot we could refer to?

    Jan-Ivar: there is one, but with too much info in it, although
    we have an algorithm to filter it

    RESOLUTION: merged 972 with language clarified on current
    device list

   [38]WebRTC API

      [38] https://github.com/w3c/webrtc-pc/

     [39]Convert RTCIceCandidatePair dictionary to an interface

      [39] https://github.com/w3c/webrtc-pc/pull/2961

    Jan-Ivar: FYI - please take a look and chime in if you have an
    opinion

     [40]setCodecPreferences should trigger negotiationneeded

      [40] https://github.com/w3c/webrtc-pc/issues/2964

    [41][Slide 30]

      [41] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=30

    Jan-Ivar: prompted by ongoing implementation of
    setCodecPreferences in Firefox
    … is it a good idea to trigger negotiationneeded as needed? if
    so, what would "as needed" actual encompass?

    Harald: when does setCodecPreferences make a difference? when
    you're in a middle of a negotiation, it will make a difference
    in the answer; it doesn't effect the local state, it can only
    change the remote state, which can only happen after
    negotiation
    … wouldn't it be simpler to just fire negotiationneeded?

    Jan-Ivar: there are edge cases when you're not in a stable
    state and negotiationneeded is fired
    … it sounds like you're agreeing that firing negotiationneeded
    would be good

    harald: I'm trying to figure out when to fire and not to fire
    … it could be we fire it when the list of codecs is different
    from what is in remote description
    … wouldn't fire when setCodecPreferences doesn't change the
    list (including because the negotiation trims down the list of
    codec preferences)
    … that would mean we need to have an internal slot to keep
    track the last codec preferences call

    jan-ivar: probably indeed, if we want to optimize the cases
    where setCodecPreferences look like it would make a difference
    but doesn't

    Florent: It's a nice idea to trigger negotiationneeded by sCP,
    but I'm worried about backwards compatibility issues
    … it could cause issues if apps get negotiation needed at
    unexpected times
    … given the complexities of identifying cases where it's needed
    and backwards compatibility issues, I'm not sure we can move
    forward

    Jan-Ivar: negotiationneeded is a queued task that can't happen
    during a negotiation
    … in other words, you would face the same issues if that was
    handled manually by the app developer
    … although I recognize there may be concerns in the transition

    Florent: sCP is already used by a lot of widely deployed
    applications - I agree this might have been a better design,
    but it's not clear changing it now is the right trade-off at
    this point
    … atm, negotiationneeded is triggered in a very limited number
    of API calls; adding it to another API call may break
    expectations

    Jan-Ivar: if you're not using the negotiationneeded event, you
    wouldn't be affected by this
    … if you're using sCP in remote-answer, neither

    Florent: this may be problematic if that was happen later in
    the middle of a transaction since apps wouldn't have been built
    to handle this
    … I'm also worried about the complexity of specifying "as
    needed"
    … maybe this could be obtained via a different mechanism, e.g.
    an additional parameter in addTransceiver

    Jan-Ivar: thanks - worth documenting these concerns in the
    github issue

     [42]receiver.getParameters().codecs seems under-specified

      [42] https://github.com/w3c/webrtc-pc/issues/2956

    [43][Slide 31]

      [43] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=31

    [44][Slide 32]

      [44] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=32

    [45][Slide 33]

      [45] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=33

    [46][Slide 34]

      [46] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=34

    Harald: the attempt was to make sure that we have the
    conceptual list that contains that we can possibly negotiation,
    and that we could add to this list over time
    … and this had to be per transceiver
    … I missed this particular usage of the list
    … we have to decide what we want to represent
    … if we want to make sure we represent only codecs that we are
    able to receive at the moment, unimplemented codecs can't be
    received of course
    … we could do this by making the enabled flag mean "currently
    willing to receive"
    … ie it would have to match the most recently accepted local
    description

    Jan-Ivar: ok, so this sounds like there is something worth
    re-instantiating from the previous algorithm

    Jan-Ivar: these slides would like apply to sendCodecs as well,
    but I haven't had the change to check in details

   Background segmentation mask

    [47][Slide 37]

      [47] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=37

    [48][Slide 38]

      [48] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=38

    [49]Video of the background mask demo

      [49] 
https://drive.google.com/file/d/1vw8gLSGzdeqM7w1N7B4uolrxqE-8mU5f/view?resourcekey

    [50][Slide 39]

      [50] 
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=39

    Riju: in background mask, the original frame remains intact and
    the mask get provided in addition to the original frame
    … both frames are provided in the same stream
    … .we expect to put up a PR sometimes this week based on this

    Elad: this looks very interesting
    … do I understand correct that the masks get interleaved in the
    stream?

    Riju: the driver provide the masks data; the code on slide 39
    shows how to operate on it

    Eero: the order is first masked frame, then original frame

    Elad: this could be confusing; could the actual frame be
    provided with metadata instead of providing it as a different
    frame?
    … getting all the data at the same time would seem easier

    Riju: the synthetic frame was easier for demo purposes, but we
    could add something like you suggested
    … we got comments on the blur flag that having both the
    original and processed one was useful IIRC

    Harald: this reminds me of discussion of alpha channels and
    masks which were very much about how to express the metadata
    … this particular approach has the question of how you transmit
    it
    … if this is encoded as metadata, the question is how it gets
    encoded
    … have you looked into encoded the mask in the alpha channel?

    eero: in Chrome, the GPU doesn't have access to an alpha
    channel

    Jan-Ivar: +1 that the alpha channel feels intuitively a better
    place for this
    … to clarify, this isn't a background replacement constraint

    Riju: right, the app can do whatever they want with the mask

    Bernard: currently we're not doing a great job of supporting
    the alpha channel - e.g. webcodecs doesn't support it
    … it's just being added to AV1
    … lots of holes currently
    … I would encourage you to file bug and spec issues

    Riju: as Elad mentioned, this would be mostly for local
    consumption

    Frederik: is the API shape sustainable, e.g when adding
    gestures detection, face detection?
    … can we add them all to metadata?

    Riju: we've been looking at these other features

    Bernard: there were discussions in the Media WG to add more
    metadata to VideoFrames and how encoders should react to it
    … they're not preserved in the encoded chunks, they get dropped

    Jan-Ivar: part of my comments on Face detection was to what
    extent this needed to be tied to the camera driver, and if
    instead this should be exposed in a generic media processing
    work

    Riju: background segmentation is a priority because you get 2x
    or 3x performance improvements

    Jan-Ivar: but is there something about masking that makes it
    worth dealing with it as a camera feature?

    Riju: this is supported on any camera on Windows or Mac
    … it takes advantage of the local optimized models available to
    native

    Harald: what controls what gets masked?

    Riju: only background/foreground

    Riju: if there is rough support, we can start with a PR and
    iterate on it

    Jan-Ivar: my concern is how it relates to generic media
    processing pipelines
    … background blur was a way to mitigate what was being provided
    by platform and needed to allow for opt-in/opt-out from apps
    … opening up an open-ended area of features would be a concern
    for us
    … this sounds like something that ought to be part of generic
    media processing library

    Riju: this provides a primitive that is generally useful across
    videoconferencing apps - green screen, blur, replacement

    Bernard: there was another discussion in the Media WG to
    discussion media processing

    dom: the tension is between doing a hardware-acceleration
    specific approach vs generic media processing

    Riju: the motivation here is the performance boost

    Jan-Ivar: no clear interest from us at this point, but this may
    change based on market interest

Summary of resolutions

     1. [51]Consensus on on #186, discussion to continue on #202
     2. [52]more discussion is needed on the lifecyle of surface
        tracks
     3. [53]merged 972 with language clarified on current device
        list

Received on Wednesday, 24 April 2024 07:09:58 UTC