- From: Dominique Hazael-Massieux <dom@w3.org>
- Date: Wed, 24 Apr 2024 09:09:56 +0200
- To: public-webrtc@w3.org
Hi, The minutes of our April 2024 meeting held yesterday are available at: https://www.w3.org/2024/04/23-webrtc-minutes.html and copied as text below. Dom WebRTC April 23 2024 meeting 23 April 2024 [2]Agenda. [3]IRC log. [2] https://www.w3.org/2011/04/webrtc/wiki/April_23_2024 [3] https://www.w3.org/2024/04/23-webrtc-irc Attendees Present Bernard, Carine, Dom, Eero, Elad, Florent, FrederikSolenberg, Guido, Harald, Jan-Ivar, Riju, Sameer, SunShin, TimP, TonyHerre, Tove Regrets - Chair Bernard, HTA, Jan-Ivar Scribe dom Contents 1. [4]Custom Codecs 2. [5]Captured Surface Switching 3. [6]Racy devicechange event design has poor interoperability in Media Capture and Streams 4. [7]WebRTC API 1. [8]Convert RTCIceCandidatePair dictionary to an interface 2. [9]setCodecPreferences should trigger negotiationneeded 3. [10]receiver.getParameters().codecs seems under-specified 5. [11]Background segmentation mask 6. [12]Summary of resolutions Meeting minutes Slideset: [13]https://lists.w3.org/Archives/Public/www-archive/ 2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf [13] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf [14]Custom Codecs [14] https://github.com/w3c/webrtc-encoded-transform/pull/186 [15][Slide 10] [15] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=10 [16][Slide 11] [16] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=11 [17][Slide 12] [17] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=12 [18][Slide 13] [18] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=13 Harald: this requires the ability of setting the mime type of aframe, which can be done two ways: with a frame constructor (merged in [19]#233), or via setMetadata ([20]#202) which has stalled … setMetadata feels like a better fit from my perspective … but at least the constructor allows for this, and so we may not need two different ways [19] https://github.com/w3c/webrtc-encoded-transform/issues/233 [20] https://github.com/w3c/webrtc-encoded-transform/issues/202 jan-ivar: I'm supportive of the API shape; on the question of constructor vs setMetadata - it's a bit complicated … because these encoded frames are mutable, unlike webcodecs … that's a bit unfortunate but it makes sense in the context of encryption … in webcodecs, frames are unmutable, which would require a copy-constructor step Harald: with immutable data, we would have to have a copy constructor with a separate argument for the data itself Jan-Ivar: iow, I don't have a clear answer to your question bernard: also supportive of this; setMetadata should be fine here, we don't have the same constraints we had in WebCodecs … for WebCodecs, we didn't want data to change while an operation is in progress … here setMetadata should be safe … it would be nice to allow for this without making a copy … For some codecs like H264, it's not just the mime type, it's also a profile, packetization mode, etc … can you set this here as well? harald: yes, it includes all the parameters [TimP: supportive of this] Harald: based on the feedback, it sounds like moving forward with [21]#202 would be worth looking into again [21] https://github.com/w3c/webrtc-encoded-transform/issues/202 Guido: setMetadata feels like a better fit for this use case (although I was supportive of the copy constructor for a separate one) Jan-Ivar: let's follow up on github [TimP: any issue with having several transforms in sequence?] Harald: if they're connected by pipelines, this creates good hand-off points from one to the next Jan-Ivar: given this, I think the copy constructor would be better fit … setMetadata can end up with @@@ issues … not clear that we should extend the problem we have with data to metadata Bernard: in WebCodecs, immutable data was a way to avoid race conditions with the work being done in a separate thread Jan-Ivar: this is handled via the transfer step here Bernard: setMetadata could only be called from the transform right? not after it has been enqueued? Jan-Ivar: setMetadata can only be called if the object is still there… … It feels to me like having setMetadata is redundant with the copy constructor Harald: right now, the copy constructor is expensive Jan-Ivar: let's continue the discussion on [22]#202 [22] https://github.com/w3c/webrtc-encoded-transform/issues/202 RESOLUTION: Consensus on on [23]#186, discussion to continue on [24]#202 [23] https://github.com/w3c/webrtc-encoded-transform/issues/186 [24] https://github.com/w3c/webrtc-encoded-transform/issues/202 [25]Captured Surface Switching [25] https://github.com/w3c/mediacapture-screen-share-extensions/issues/4 [26][Slide 17] [26] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=17 [27][Slide 18] [27] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=18 [28][Slide 19] [28] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=19 [29][Slide 20] [29] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=20 [30][Slide 21] [30] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=21 [31][Slide 22] [31] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=22 [32][Slide 23] [32] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=23 [33][Slide 24] [33] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=24 [34][Slide 25] [34] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=25 Tove: is this a promising way forward? [TimP: Is simply supplying an event handler enough to discriminate ? Do we actually need the surface/session property?] Tove: we discussed this in the December meeting whether an event handler (back then, a callback) would be enough to discriminate … and there is a design principe that changing behavior whether an event handler is on Jan-Ivar: indeed; there are cases where that would be OK … we haven't talked about stopping tracks here … it might be OK for the user agent to optimize away user visible behavior when it comes to how quickly the indicators state/permission UX change Jan-Ivar: for backwards compatibility, I think we're in agreement the UA could optimize the case when no event handler has been added Tove: the original proposal was that you would always get the two kind of tracks which if you don't need it would still need to be managed … hence this new proposal that lets apps pick which tracks they want Jan-Ivar: If I opt-in to the surface track, what would getDisplayMedia return? Tove: I'm proposing getDisplayMedia returns the session track, and the event exposes the surface track … but I'm open to other approaches Elad: what if we had a getter for the session track, but only return the surface track from getDisplayMedia … that way you don't have to wait for an event, you could access to either at any point … stopping for unused surface tracks could be handled by the capturecontroller Jan-Ivar: I like the behavior and concepts of surface/session tracks … but asking developers to pick one upfront feels artificial … I could move from one tab to another tab with audio, but then stay in tab+audio mode moving forward … hence why I was proposing to expose both and let the app close the ones they don't want … I was initially worried this would lead to confusing indicators … but Youenn convinced me this could be optimized away Harald: if I want to write an app that handles switching of surfaces and have code that covers both cases, I would struggle to maintain two code paths to manage what gets presented to the end user Tove: the problem I see with Jan-Ivar's proposal is that we lose the guarantee that one track represents one surface which I think is an attractive invariant Jan-Ivar: I don't think Web developers need to care about that; there is an isolation principle that when switching from one surface to another, you're also switching sources … I like slide 19 - the only thing missing is stopping tracks … if a developer doesn't care about surface track at all, don't register an event handler … you would want to stop old tracks in the event handler … this would also let the developer choose live which tracks they can support Elad: what happens if the app doesn't stop either track? Jan-Ivar: the backwards compatible design is injection; would we be talking about ending that model? RESOLUTION: more discussion is needed on the lifecyle of surface tracks [35]Racy devicechange event design has poor interoperability in Media Capture and Streams [35] https://github.com/w3c/mediacapture-main/issues/972 [36][Slide 28] [36] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=28 [37][Slide 29] [37] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=29 Jan-Ivar: this is modeled on the RTC track event Jan-Ivar: any objection to merging this PR? Guido: what does "current result from enumerateDevices" mean? Jan-Ivar: good point, I should rephrase that - it's the devices at the time of the event is fired … this would be a synchronous equivalent to what enumerateDevices would produce Guido: I agree with the change, but the language should be clarified Dom: is there an existing internal slot we could refer to? Jan-Ivar: there is one, but with too much info in it, although we have an algorithm to filter it RESOLUTION: merged 972 with language clarified on current device list [38]WebRTC API [38] https://github.com/w3c/webrtc-pc/ [39]Convert RTCIceCandidatePair dictionary to an interface [39] https://github.com/w3c/webrtc-pc/pull/2961 Jan-Ivar: FYI - please take a look and chime in if you have an opinion [40]setCodecPreferences should trigger negotiationneeded [40] https://github.com/w3c/webrtc-pc/issues/2964 [41][Slide 30] [41] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=30 Jan-Ivar: prompted by ongoing implementation of setCodecPreferences in Firefox … is it a good idea to trigger negotiationneeded as needed? if so, what would "as needed" actual encompass? Harald: when does setCodecPreferences make a difference? when you're in a middle of a negotiation, it will make a difference in the answer; it doesn't effect the local state, it can only change the remote state, which can only happen after negotiation … wouldn't it be simpler to just fire negotiationneeded? Jan-Ivar: there are edge cases when you're not in a stable state and negotiationneeded is fired … it sounds like you're agreeing that firing negotiationneeded would be good harald: I'm trying to figure out when to fire and not to fire … it could be we fire it when the list of codecs is different from what is in remote description … wouldn't fire when setCodecPreferences doesn't change the list (including because the negotiation trims down the list of codec preferences) … that would mean we need to have an internal slot to keep track the last codec preferences call jan-ivar: probably indeed, if we want to optimize the cases where setCodecPreferences look like it would make a difference but doesn't Florent: It's a nice idea to trigger negotiationneeded by sCP, but I'm worried about backwards compatibility issues … it could cause issues if apps get negotiation needed at unexpected times … given the complexities of identifying cases where it's needed and backwards compatibility issues, I'm not sure we can move forward Jan-Ivar: negotiationneeded is a queued task that can't happen during a negotiation … in other words, you would face the same issues if that was handled manually by the app developer … although I recognize there may be concerns in the transition Florent: sCP is already used by a lot of widely deployed applications - I agree this might have been a better design, but it's not clear changing it now is the right trade-off at this point … atm, negotiationneeded is triggered in a very limited number of API calls; adding it to another API call may break expectations Jan-Ivar: if you're not using the negotiationneeded event, you wouldn't be affected by this … if you're using sCP in remote-answer, neither Florent: this may be problematic if that was happen later in the middle of a transaction since apps wouldn't have been built to handle this … I'm also worried about the complexity of specifying "as needed" … maybe this could be obtained via a different mechanism, e.g. an additional parameter in addTransceiver Jan-Ivar: thanks - worth documenting these concerns in the github issue [42]receiver.getParameters().codecs seems under-specified [42] https://github.com/w3c/webrtc-pc/issues/2956 [43][Slide 31] [43] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=31 [44][Slide 32] [44] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=32 [45][Slide 33] [45] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=33 [46][Slide 34] [46] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=34 Harald: the attempt was to make sure that we have the conceptual list that contains that we can possibly negotiation, and that we could add to this list over time … and this had to be per transceiver … I missed this particular usage of the list … we have to decide what we want to represent … if we want to make sure we represent only codecs that we are able to receive at the moment, unimplemented codecs can't be received of course … we could do this by making the enabled flag mean "currently willing to receive" … ie it would have to match the most recently accepted local description Jan-Ivar: ok, so this sounds like there is something worth re-instantiating from the previous algorithm Jan-Ivar: these slides would like apply to sendCodecs as well, but I haven't had the change to check in details Background segmentation mask [47][Slide 37] [47] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=37 [48][Slide 38] [48] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=38 [49]Video of the background mask demo [49] https://drive.google.com/file/d/1vw8gLSGzdeqM7w1N7B4uolrxqE-8mU5f/view?resourcekey [50][Slide 39] [50] https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=39 Riju: in background mask, the original frame remains intact and the mask get provided in addition to the original frame … both frames are provided in the same stream … .we expect to put up a PR sometimes this week based on this Elad: this looks very interesting … do I understand correct that the masks get interleaved in the stream? Riju: the driver provide the masks data; the code on slide 39 shows how to operate on it Eero: the order is first masked frame, then original frame Elad: this could be confusing; could the actual frame be provided with metadata instead of providing it as a different frame? … getting all the data at the same time would seem easier Riju: the synthetic frame was easier for demo purposes, but we could add something like you suggested … we got comments on the blur flag that having both the original and processed one was useful IIRC Harald: this reminds me of discussion of alpha channels and masks which were very much about how to express the metadata … this particular approach has the question of how you transmit it … if this is encoded as metadata, the question is how it gets encoded … have you looked into encoded the mask in the alpha channel? eero: in Chrome, the GPU doesn't have access to an alpha channel Jan-Ivar: +1 that the alpha channel feels intuitively a better place for this … to clarify, this isn't a background replacement constraint Riju: right, the app can do whatever they want with the mask Bernard: currently we're not doing a great job of supporting the alpha channel - e.g. webcodecs doesn't support it … it's just being added to AV1 … lots of holes currently … I would encourage you to file bug and spec issues Riju: as Elad mentioned, this would be mostly for local consumption Frederik: is the API shape sustainable, e.g when adding gestures detection, face detection? … can we add them all to metadata? Riju: we've been looking at these other features Bernard: there were discussions in the Media WG to add more metadata to VideoFrames and how encoders should react to it … they're not preserved in the encoded chunks, they get dropped Jan-Ivar: part of my comments on Face detection was to what extent this needed to be tied to the camera driver, and if instead this should be exposed in a generic media processing work Riju: background segmentation is a priority because you get 2x or 3x performance improvements Jan-Ivar: but is there something about masking that makes it worth dealing with it as a camera feature? Riju: this is supported on any camera on Windows or Mac … it takes advantage of the local optimized models available to native Harald: what controls what gets masked? Riju: only background/foreground Riju: if there is rough support, we can start with a PR and iterate on it Jan-Ivar: my concern is how it relates to generic media processing pipelines … background blur was a way to mitigate what was being provided by platform and needed to allow for opt-in/opt-out from apps … opening up an open-ended area of features would be a concern for us … this sounds like something that ought to be part of generic media processing library Riju: this provides a primitive that is generally useful across videoconferencing apps - green screen, blur, replacement Bernard: there was another discussion in the Media WG to discussion media processing dom: the tension is between doing a hardware-acceleration specific approach vs generic media processing Riju: the motivation here is the performance boost Jan-Ivar: no clear interest from us at this point, but this may change based on market interest Summary of resolutions 1. [51]Consensus on on #186, discussion to continue on #202 2. [52]more discussion is needed on the lifecyle of surface tracks 3. [53]merged 972 with language clarified on current device list
Received on Wednesday, 24 April 2024 07:09:58 UTC