- From: Dominique Hazael-Massieux <dom@w3.org>
- Date: Wed, 24 Apr 2024 09:09:56 +0200
- To: public-webrtc@w3.org
Hi,
The minutes of our April 2024 meeting held yesterday are available at:
https://www.w3.org/2024/04/23-webrtc-minutes.html
and copied as text below.
Dom
WebRTC April 23 2024 meeting
23 April 2024
[2]Agenda. [3]IRC log.
[2] https://www.w3.org/2011/04/webrtc/wiki/April_23_2024
[3] https://www.w3.org/2024/04/23-webrtc-irc
Attendees
Present
Bernard, Carine, Dom, Eero, Elad, Florent,
FrederikSolenberg, Guido, Harald, Jan-Ivar, Riju,
Sameer, SunShin, TimP, TonyHerre, Tove
Regrets
-
Chair
Bernard, HTA, Jan-Ivar
Scribe
dom
Contents
1. [4]Custom Codecs
2. [5]Captured Surface Switching
3. [6]Racy devicechange event design has poor interoperability
in Media Capture and Streams
4. [7]WebRTC API
1. [8]Convert RTCIceCandidatePair dictionary to an
interface
2. [9]setCodecPreferences should trigger
negotiationneeded
3. [10]receiver.getParameters().codecs seems
under-specified
5. [11]Background segmentation mask
6. [12]Summary of resolutions
Meeting minutes
Slideset: [13]https://lists.w3.org/Archives/Public/www-archive/
2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf
[13]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf
[14]Custom Codecs
[14] https://github.com/w3c/webrtc-encoded-transform/pull/186
[15][Slide 10]
[15]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=10
[16][Slide 11]
[16]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=11
[17][Slide 12]
[17]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=12
[18][Slide 13]
[18]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=13
Harald: this requires the ability of setting the mime type of
aframe, which can be done two ways: with a frame constructor
(merged in [19]#233), or via setMetadata ([20]#202) which has
stalled
… setMetadata feels like a better fit from my perspective
… but at least the constructor allows for this, and so we may
not need two different ways
[19] https://github.com/w3c/webrtc-encoded-transform/issues/233
[20] https://github.com/w3c/webrtc-encoded-transform/issues/202
jan-ivar: I'm supportive of the API shape; on the question of
constructor vs setMetadata - it's a bit complicated
… because these encoded frames are mutable, unlike webcodecs
… that's a bit unfortunate but it makes sense in the context of
encryption
… in webcodecs, frames are unmutable, which would require a
copy-constructor step
Harald: with immutable data, we would have to have a copy
constructor with a separate argument for the data itself
Jan-Ivar: iow, I don't have a clear answer to your question
bernard: also supportive of this; setMetadata should be fine
here, we don't have the same constraints we had in WebCodecs
… for WebCodecs, we didn't want data to change while an
operation is in progress
… here setMetadata should be safe
… it would be nice to allow for this without making a copy
… For some codecs like H264, it's not just the mime type, it's
also a profile, packetization mode, etc
… can you set this here as well?
harald: yes, it includes all the parameters
[TimP: supportive of this]
Harald: based on the feedback, it sounds like moving forward
with [21]#202 would be worth looking into again
[21] https://github.com/w3c/webrtc-encoded-transform/issues/202
Guido: setMetadata feels like a better fit for this use case
(although I was supportive of the copy constructor for a
separate one)
Jan-Ivar: let's follow up on github
[TimP: any issue with having several transforms in sequence?]
Harald: if they're connected by pipelines, this creates good
hand-off points from one to the next
Jan-Ivar: given this, I think the copy constructor would be
better fit
… setMetadata can end up with @@@ issues
… not clear that we should extend the problem we have with data
to metadata
Bernard: in WebCodecs, immutable data was a way to avoid race
conditions with the work being done in a separate thread
Jan-Ivar: this is handled via the transfer step here
Bernard: setMetadata could only be called from the transform
right? not after it has been enqueued?
Jan-Ivar: setMetadata can only be called if the object is still
there…
… It feels to me like having setMetadata is redundant with the
copy constructor
Harald: right now, the copy constructor is expensive
Jan-Ivar: let's continue the discussion on [22]#202
[22] https://github.com/w3c/webrtc-encoded-transform/issues/202
RESOLUTION: Consensus on on [23]#186, discussion to continue on
[24]#202
[23] https://github.com/w3c/webrtc-encoded-transform/issues/186
[24] https://github.com/w3c/webrtc-encoded-transform/issues/202
[25]Captured Surface Switching
[25]
https://github.com/w3c/mediacapture-screen-share-extensions/issues/4
[26][Slide 17]
[26]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=17
[27][Slide 18]
[27]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=18
[28][Slide 19]
[28]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=19
[29][Slide 20]
[29]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=20
[30][Slide 21]
[30]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=21
[31][Slide 22]
[31]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=22
[32][Slide 23]
[32]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=23
[33][Slide 24]
[33]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=24
[34][Slide 25]
[34]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=25
Tove: is this a promising way forward?
[TimP: Is simply supplying an event handler enough to
discriminate ? Do we actually need the surface/session
property?]
Tove: we discussed this in the December meeting whether an
event handler (back then, a callback) would be enough to
discriminate
… and there is a design principe that changing behavior whether
an event handler is on
Jan-Ivar: indeed; there are cases where that would be OK
… we haven't talked about stopping tracks here
… it might be OK for the user agent to optimize away user
visible behavior when it comes to how quickly the indicators
state/permission UX change
Jan-Ivar: for backwards compatibility, I think we're in
agreement the UA could optimize the case when no event handler
has been added
Tove: the original proposal was that you would always get the
two kind of tracks which if you don't need it would still need
to be managed
… hence this new proposal that lets apps pick which tracks they
want
Jan-Ivar: If I opt-in to the surface track, what would
getDisplayMedia return?
Tove: I'm proposing getDisplayMedia returns the session track,
and the event exposes the surface track
… but I'm open to other approaches
Elad: what if we had a getter for the session track, but only
return the surface track from getDisplayMedia
… that way you don't have to wait for an event, you could
access to either at any point
… stopping for unused surface tracks could be handled by the
capturecontroller
Jan-Ivar: I like the behavior and concepts of surface/session
tracks
… but asking developers to pick one upfront feels artificial
… I could move from one tab to another tab with audio, but then
stay in tab+audio mode moving forward
… hence why I was proposing to expose both and let the app
close the ones they don't want
… I was initially worried this would lead to confusing
indicators
… but Youenn convinced me this could be optimized away
Harald: if I want to write an app that handles switching of
surfaces and have code that covers both cases, I would struggle
to maintain two code paths to manage what gets presented to the
end user
Tove: the problem I see with Jan-Ivar's proposal is that we
lose the guarantee that one track represents one surface which
I think is an attractive invariant
Jan-Ivar: I don't think Web developers need to care about that;
there is an isolation principle that when switching from one
surface to another, you're also switching sources
… I like slide 19 - the only thing missing is stopping tracks
… if a developer doesn't care about surface track at all, don't
register an event handler
… you would want to stop old tracks in the event handler
… this would also let the developer choose live which tracks
they can support
Elad: what happens if the app doesn't stop either track?
Jan-Ivar: the backwards compatible design is injection; would
we be talking about ending that model?
RESOLUTION: more discussion is needed on the lifecyle of
surface tracks
[35]Racy devicechange event design has poor interoperability in Media
Capture and Streams
[35] https://github.com/w3c/mediacapture-main/issues/972
[36][Slide 28]
[36]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=28
[37][Slide 29]
[37]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=29
Jan-Ivar: this is modeled on the RTC track event
Jan-Ivar: any objection to merging this PR?
Guido: what does "current result from enumerateDevices" mean?
Jan-Ivar: good point, I should rephrase that - it's the devices
at the time of the event is fired
… this would be a synchronous equivalent to what
enumerateDevices would produce
Guido: I agree with the change, but the language should be
clarified
Dom: is there an existing internal slot we could refer to?
Jan-Ivar: there is one, but with too much info in it, although
we have an algorithm to filter it
RESOLUTION: merged 972 with language clarified on current
device list
[38]WebRTC API
[38] https://github.com/w3c/webrtc-pc/
[39]Convert RTCIceCandidatePair dictionary to an interface
[39] https://github.com/w3c/webrtc-pc/pull/2961
Jan-Ivar: FYI - please take a look and chime in if you have an
opinion
[40]setCodecPreferences should trigger negotiationneeded
[40] https://github.com/w3c/webrtc-pc/issues/2964
[41][Slide 30]
[41]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=30
Jan-Ivar: prompted by ongoing implementation of
setCodecPreferences in Firefox
… is it a good idea to trigger negotiationneeded as needed? if
so, what would "as needed" actual encompass?
Harald: when does setCodecPreferences make a difference? when
you're in a middle of a negotiation, it will make a difference
in the answer; it doesn't effect the local state, it can only
change the remote state, which can only happen after
negotiation
… wouldn't it be simpler to just fire negotiationneeded?
Jan-Ivar: there are edge cases when you're not in a stable
state and negotiationneeded is fired
… it sounds like you're agreeing that firing negotiationneeded
would be good
harald: I'm trying to figure out when to fire and not to fire
… it could be we fire it when the list of codecs is different
from what is in remote description
… wouldn't fire when setCodecPreferences doesn't change the
list (including because the negotiation trims down the list of
codec preferences)
… that would mean we need to have an internal slot to keep
track the last codec preferences call
jan-ivar: probably indeed, if we want to optimize the cases
where setCodecPreferences look like it would make a difference
but doesn't
Florent: It's a nice idea to trigger negotiationneeded by sCP,
but I'm worried about backwards compatibility issues
… it could cause issues if apps get negotiation needed at
unexpected times
… given the complexities of identifying cases where it's needed
and backwards compatibility issues, I'm not sure we can move
forward
Jan-Ivar: negotiationneeded is a queued task that can't happen
during a negotiation
… in other words, you would face the same issues if that was
handled manually by the app developer
… although I recognize there may be concerns in the transition
Florent: sCP is already used by a lot of widely deployed
applications - I agree this might have been a better design,
but it's not clear changing it now is the right trade-off at
this point
… atm, negotiationneeded is triggered in a very limited number
of API calls; adding it to another API call may break
expectations
Jan-Ivar: if you're not using the negotiationneeded event, you
wouldn't be affected by this
… if you're using sCP in remote-answer, neither
Florent: this may be problematic if that was happen later in
the middle of a transaction since apps wouldn't have been built
to handle this
… I'm also worried about the complexity of specifying "as
needed"
… maybe this could be obtained via a different mechanism, e.g.
an additional parameter in addTransceiver
Jan-Ivar: thanks - worth documenting these concerns in the
github issue
[42]receiver.getParameters().codecs seems under-specified
[42] https://github.com/w3c/webrtc-pc/issues/2956
[43][Slide 31]
[43]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=31
[44][Slide 32]
[44]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=32
[45][Slide 33]
[45]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=33
[46][Slide 34]
[46]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=34
Harald: the attempt was to make sure that we have the
conceptual list that contains that we can possibly negotiation,
and that we could add to this list over time
… and this had to be per transceiver
… I missed this particular usage of the list
… we have to decide what we want to represent
… if we want to make sure we represent only codecs that we are
able to receive at the moment, unimplemented codecs can't be
received of course
… we could do this by making the enabled flag mean "currently
willing to receive"
… ie it would have to match the most recently accepted local
description
Jan-Ivar: ok, so this sounds like there is something worth
re-instantiating from the previous algorithm
Jan-Ivar: these slides would like apply to sendCodecs as well,
but I haven't had the change to check in details
Background segmentation mask
[47][Slide 37]
[47]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=37
[48][Slide 38]
[48]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=38
[49]Video of the background mask demo
[49]
https://drive.google.com/file/d/1vw8gLSGzdeqM7w1N7B4uolrxqE-8mU5f/view?resourcekey
[50][Slide 39]
[50]
https://lists.w3.org/Archives/Public/www-archive/2024Apr/att-0000/WEBRTCWG-2024-04-23.pdf#page=39
Riju: in background mask, the original frame remains intact and
the mask get provided in addition to the original frame
… both frames are provided in the same stream
… .we expect to put up a PR sometimes this week based on this
Elad: this looks very interesting
… do I understand correct that the masks get interleaved in the
stream?
Riju: the driver provide the masks data; the code on slide 39
shows how to operate on it
Eero: the order is first masked frame, then original frame
Elad: this could be confusing; could the actual frame be
provided with metadata instead of providing it as a different
frame?
… getting all the data at the same time would seem easier
Riju: the synthetic frame was easier for demo purposes, but we
could add something like you suggested
… we got comments on the blur flag that having both the
original and processed one was useful IIRC
Harald: this reminds me of discussion of alpha channels and
masks which were very much about how to express the metadata
… this particular approach has the question of how you transmit
it
… if this is encoded as metadata, the question is how it gets
encoded
… have you looked into encoded the mask in the alpha channel?
eero: in Chrome, the GPU doesn't have access to an alpha
channel
Jan-Ivar: +1 that the alpha channel feels intuitively a better
place for this
… to clarify, this isn't a background replacement constraint
Riju: right, the app can do whatever they want with the mask
Bernard: currently we're not doing a great job of supporting
the alpha channel - e.g. webcodecs doesn't support it
… it's just being added to AV1
… lots of holes currently
… I would encourage you to file bug and spec issues
Riju: as Elad mentioned, this would be mostly for local
consumption
Frederik: is the API shape sustainable, e.g when adding
gestures detection, face detection?
… can we add them all to metadata?
Riju: we've been looking at these other features
Bernard: there were discussions in the Media WG to add more
metadata to VideoFrames and how encoders should react to it
… they're not preserved in the encoded chunks, they get dropped
Jan-Ivar: part of my comments on Face detection was to what
extent this needed to be tied to the camera driver, and if
instead this should be exposed in a generic media processing
work
Riju: background segmentation is a priority because you get 2x
or 3x performance improvements
Jan-Ivar: but is there something about masking that makes it
worth dealing with it as a camera feature?
Riju: this is supported on any camera on Windows or Mac
… it takes advantage of the local optimized models available to
native
Harald: what controls what gets masked?
Riju: only background/foreground
Riju: if there is rough support, we can start with a PR and
iterate on it
Jan-Ivar: my concern is how it relates to generic media
processing pipelines
… background blur was a way to mitigate what was being provided
by platform and needed to allow for opt-in/opt-out from apps
… opening up an open-ended area of features would be a concern
for us
… this sounds like something that ought to be part of generic
media processing library
Riju: this provides a primitive that is generally useful across
videoconferencing apps - green screen, blur, replacement
Bernard: there was another discussion in the Media WG to
discussion media processing
dom: the tension is between doing a hardware-acceleration
specific approach vs generic media processing
Riju: the motivation here is the performance boost
Jan-Ivar: no clear interest from us at this point, but this may
change based on market interest
Summary of resolutions
1. [51]Consensus on on #186, discussion to continue on #202
2. [52]more discussion is needed on the lifecyle of surface
tracks
3. [53]merged 972 with language clarified on current device
list
Received on Wednesday, 24 April 2024 07:09:58 UTC