- From: Dominique Hazael-Massieux <dom@w3.org>
- Date: Wed, 18 Oct 2023 11:03:21 +0100
- To: public-webrtc@w3.org
Hi,
The minutes of the WebRTC WG meeting held on October 17 are available at:
https://www.w3.org/2023/10/17-webrtc-minutes.html
Thanks Henrik for scribing!
Dom
WebRTC WG Teleconference minutes 2023-10-17
Slideset:
[1]https://lists.w3.org/Archives/Public/www-archive/2023Oct/att
-0002/WEBRTCWG-2023-10-17.pdf
[1]
https://lists.w3.org/Archives/Public/www-archive/2023Oct/att-0002/WEBRTCWG-2023-10-17.pdf
Scribe: Henrik
Congestion control (Harald)
Network may be limited. Sending too much causes discards and is
bad manners. The browser will police the sender.
What about EncodedTransform? Without it, the transport
estimates for you, telling the encoders what the target is. But
the transform changes things: size changes. Therefore, the
transform needs to know the target.
Proposal: add cancellable events for significant changes in
available BW.
See slides for WebIDL and examples.
Jan-Ivar: I don’t understand the use case. It seems to be that
we’re expanding the role of the application from simply
transforming to something more. The user agent should already
see how much data was added by the transform, why would we need
to add BW information and allow modifying that? Is it
justified?
Harald: Sending more data than there is room for is a bad
thing, see previous slides. Letting the downstream decide what
was added requires that the downstream can see both the
incoming and outgoing sizes of the transform, and that the
added outgoing information is consistent over time.
Youenn: I can see that you would want to override the user
agent. But I think the user agent knows the overhead that the
transform is adding already. So it can do something. What we
need to understand is in which situations, the user agent
behavior is not good. A transform that is doing a lot of things
can drop frames. I’m fine with letting the web page influence
this, but I am not sure how. If this API will be easy to use by
web developers. It’s not clear how practical BandwidthInfo. But
I think it is worth continuing the investigation.
Harald: So you think there is a case for onkeyframerequest,
bringing that forward as a separate PR, correct?
Youenn: I think so, we should discuss it, but I see the use key
and it is straightforward boolean. That’s much easier than the
BW info, since there are multiple parameters.
Harald: So we have a controversial and non-controversial part,
let’s separate. But in the case where the transform is not
capable of doing the right thing is when the frame comes from
another source. Because then the source might be in app
control, but not under control of the encoder of the sender. So
the sender might do everything it can of the encoder, but if
the encoder is not the source of the frame, we’re in trouble.
For that use case, we need something like this.
Bernard: Question about the flow of the BW info. You get it via
the onbandwidthestimate?
Harald: We should fire events and let the user read the state
of the event. You have to read BW info when you get the
onbandwidthestimate.
Bernard: So you make a copy and then call
sendBandwidthEstimate, correct? So it’s not actually setting on
the wire?
Jan-Ivar: I’m trying to follow up on the sources. And part of
my concern is that we don’t have consensus yet if this is the
right API shape. But question about the events: so network can
change, but one of the use cases is that metadata can change.
Is this mean to be a signal that the user agent can use
punitively on JS that is producing too much data? Or is this
plain data?
Harald: The sender is allowed to drop frames, that’s something
we already agreed on. But this is giving the information up so
that the app can adjust with a high probability that downstream
does not later drop the frame. The BW can change fast of course
so there is never a guarantee. But if the transform adds
metadata about for example the silhouette, or standstill, that
you use for background replace, and it knows that this
information will att 1 KB to the frame, then the transform
knows it is changing from not sending to sending this data,
then it can proactively tell the encoder this, because I will
now add more stuff to your frames. This is why you might need
to set this even when there is no signal from the transform.
Jan-Ivar: It might seem appropriate/useful to me to signal
something about the caps to JS, but running events at 60 frames
per second might seem unnecessary.
Harald: A lot of the time there will not be any change, so I
think firing events is appropriate since it will only fire some
of the time. You could read it just before deciding what to do,
and that is perfectly reasonable.
Conclusion: Separate PR for key frame event. But maybe we don’t
need an event for BW info, you can just read it. I can make
those changes and come back in November.
Mediacapture-screenshare (Elad)
Elad is not here. Skipping for now.
Mediacapture Extensions
Henrik…
We have video frame counters, we should similarly add audio
frame counter for the same reasons, like calculating percentage
of frames lost (e.g. detect audio glitches). But in the audio
case it’s also very interesting for audio quality to know about
capture delay, so we should also measure the delay between
capture and the audio frames being delivered. (Arguably you
might want this for video too, but so far nobody has asked for
it). So here is the PR. We might want to modify it to say
totalFrames instead of droppedFrames (you can calculate drop
from total by subtracting delivered) as this would be more
consistent with the audio stats. But in general can we move on
and merge this PR based on following up on this in the editor’s
meeting?
Jan-Ivar: Paul’s not here but he put some comments on the issue
that it would be great if you could look at. But overall I
applaud the move and think this is good.
Discussion around naming, clarifications around what delivered
means. But overall approach is not controversial.
Henrik: Delivered is when the frames are being handed off to
the sinks. This is the exact same definitions as for the video
frames.
Jan-Ivar: But this isn’t observable.
Henrik: No but it covers the part of the pipeline up to the
sink. For example if there is a delay before that, and then you
use a WebRTC peer connection, and the peer connection adds
additional delay in encoding and sending, then WebRTC getStats
would have to tell you about any additional delays there. So
even though the exact delivery time is not observable, the
capture delay is a hopefully a quite well understood concept if
we clarify it, and since this is only an estimate anyway. I
mean, if the user is experiencing 300 ms delay but the API says
50 ms, then that’s clearly a bad implementation and we should
file a bug to make the capture delay more accurate.
Youenn: In the webrtc stats we talk about samples, maybe it
would make more sense to talk about samples here too since
audio and video is different, that may be what audio folks
prefer.
Henrik: Actually audio frames is what I was asked to use based
on Paul’s input and is consistent with other audio APIs. Also
on a historical note, the webrtc stats using samples was a
mistake, and there’s actually an existing note about this
explaining how the samples are normalized on number of audio
channels. So the webrtc stats using samples is actually
misleading and not what is actually measured there, so we
should use frames.
Conclusion: Overall approach makes sense, flesh out the details
in the editors meeting
Grab bag: Racy devicechange event design has poor interoperability
(Jan-Ivar)
Problem: enumerating devices can take 100+ ms, where the device
change event and enumerating gets out of sync. It becomes hard
to reason and trial and error coding, which eventually passes
QA, but it could pass for unintented side effects and fail in
other browsers.
Proposal: Include devices as a parameter to the ondevicechanged
event.
Youenn: I’m wondering if we could try to deprecate enumerate
devices, but anyway, we have talked about trying to be more
explicit about why the event fires (Jan-Ivar: that’s the next
slide).
Harald: So this means that when you fire the event, you have
already done the work to enumerate all devices, so it would
probably fire later than today?
Jan-Ivar: I think the way we have written the the algorithm
that information should already have been acquired, but yeah,
otherwise there would be a delay.
Harald: I think that’s ok, it might lead to less firing events.
Guido: I’m ok with the change too.
Conclusion: No objection.
Grab bag: Should devicechange fire when the device info info changes?
(Jan-Ivar)
The spec says to fire when the set of devices available to the
user agent have changed. But user agents already lie about this
and fire when mono changes. Or based on getUserMedia. So the
question is, should we change the spec here or should we change
Safari?
Proposal A, B, C (see slides). I think we should go with
proposal A which is no change.
Youenn: You say it’s not web compatible, but somehow we shipped
it, so it’s not clear it’s not web compatible. The spec is
saying that the user agent if it has access to some devices,
and you could see a world where the user agent does not have
access to any devices until the OS has been prompted about
wanting to use the devices, so I think in that sense Safari is
following the spec.
Jan-Ivar: Should we not have an event that makes auto switching
easy?
Youenn: Yes that is interesting and we could have an event that
said why it fired, that might be much easier to understand to
web developers. But if it can be solved with enumerateDevices,
then that is fine as well. But I think Safari is following the
spec.
Jan-Ivar: Do you have a preference?
Youenn: Hmm. (Needs to think)
Guido: I’m more inclined to make the change more generic than
devices available to the web page. But my main concern is what
the current wording is about set of devices available changes
to the user agent. What if the label changes? Is that a change
to the list of the devices or not? I would like the event to
fire if there is any change (such as label for the sake of
argument) if the result changes. Anything that changes the
result should fire an event. What do you think?
Jan-Ivar: Is that proposal C?
Guido: Well not necessarily. Because you focus on the case
where when you call getUserMedia the set of devices changes.
I’m not against firing it in that case, I’m inclined to fire on
any change available to the web page. But what needs to be
clarified is what a change to the set of devices means.
Jan-Ivar: OS level changes for example? In Safari’s case it is
the user agent changing the labels.
Guido: So what does “set of media devices has changed” mean?
One interpretation is that anything in the devices changed, is
it the number of elements or is it any change in the elements
that is already there? My main concern is that I want the event
to fire if anything changes, not just the number of devices.
Can we update the wording?
Jan-Ivar: That might be OK, it’s probably quite rare, I’m not
that concerned, but that would be OK. My main concern here is
if Safari is right in firing the event.
Henrik: Arguably the set of devices is not very relevant to the
JS app, from the JS app the only thing that matters is what the
result is and if that result changes. So if one browser changes
the set of devices but another browser doesn’t, even though
that is different behavior, it isn’t necessarily a web compat
issue since only in the browser that something changed do you
need the app to respond, so as long as the event firing is
consistent in the browser it should hopefully make sense.
Jan-Ivar: But the app may want to know if the user plugged in a
device, and now it could fire in some browsers without that
changing. Maybe prompt or not.
Youenn: The user agent is in a good spot to think the user
might want to switch. There are cases where you may or may not
want to auto switch, for example the aidpods might get
automatically connected by being placed closed to the macbook,
so maybe we could expose more information to the app instead.
Jan-Ivar: Perhaps we could have a device plugged event?
Conclusion: More discussion needed, but we seem to agree on the
problems we need to solve.
Exposing decode errors (Philipp Hancke)
This was discussed at TPAC, generally in favor of exposing on
the RTCRtpSender/RTCRtpReceiver rather than on the peer
connection. Makes sense? -Yes.
A PR is presented adding RTCRtpSenderErrorEvent extending
Error.
Henrik: Spatial index and encoding index is not the same thing.
You could have an encoding with multiple spatial indices, e.g.
L3T1, or you could have three encodings with L1T1, these are
different things as we have one or multiple encoders.
Philipp: But spatial index is already used in other places.
Henrik: If so that is a mistake, I have been cleaning up the
code base with a lot of places where these two things get mixed
up, so we definitely should not duplicate this confusion to new
APIs.
Bernard: [2]WebCodecs issue 669 describes the approach in
WebCodecs,, which is to use EncodingError for an issue with
data (e.g. decoder can’t parse) and OperationError for resource
issues.
[2] https://github.com/w3c/webcodecs/issues/669
Philipp: We specifically want to know about SW fallback.
Youenn: Possible privacy issue. I think we should have a
fingerprint on the PR, and ask the PING people.
Florent: You can imagine an application abusing the different
settings already if it wanted to do fingerprinting, so this is
not necessarily unique.
Youenn: If so we should add a note if we missed something
earlier.
Jan-Ivar: I’m confused, when should applications fire this?
Only SW fallback, or other reasons?
Philipp: There are multiple reasons for falling back to SW.
That’s one of the main reasons for wanting SW fallback. For
example HW queue gets overloaded.
Jan-Ivar: It would be good to express what the app is expected
to do in response to the event.
Henrik: Is always SW fallback?
Philipp: There can be cases where SW fallback.
Conclusion: More clarification and discussion needed.
setCodecPreferences vs unidirectional codecs (Philipp Hancke)
Some codecs, H264 profiles in particular, are send-only or
receive-only. The algorithm says to look at send and receive
directions, but it does not take the transceiver direction into
account.
We need to take directionality into account. But if we do we
need to throw an exception in setCodecPreferences or setting
direction if we get to an incompatible state.
Youenn: Can it say “I want this codec but only for send”. Hm,
no.
Florent: Having the direction would make sense, I think a lot
of issues comes from codecs that cannot be decoded. I wonder if
we should have more restrictions in how we handle codecs. Maybe
have send codecs be a subset of receive codecs. It would fix
some issues.
Harald: I think we should take directionality into account. I
encountered some of the same problems when specifying the SDP
negotiation PR we’ll get to later. We’ll need to look more into
how JS has access to send and receive codecs separately.
Henrik: We already have that, with sender and receiver
getCapabilities.
Bernard: I think in general it is ready for PR, then we can
look at the PR and evaluate it.
Florent: You can still run into issues later on. (?)
Conclusion: Philipp will provide a PR
SDP codec negotiation (Harald)
The codec information needs to be presented before the
negotiation. What I proposed was to make it possible to inject
names that the platform does not understand. There is no need
to change the SDP rules, that’s important.
Before sending data we have to choose which encoder to use. And
if we transform the data we need to tell the transform what
payload type to use, because it is different from the incoming
PT. The new PT does not have to be understood by the platform
(app level PT). Similarly on the decoder side you need to tell
the depackatizer and decoder what PT to use. This PT the
platform does have to understand.
Adding an API for this plugs in a glaring hole. The issue we
need to fix was presented in March, the first version of the PR
was presented in June, with conclusion to adopt PR with details
to be discussed. Presented again at TPAC, the summary said that
there were arguments on both sides. Suddenly packetizers are up
for discussion. So I have revised the PR based on my
understanding of the TPAC discussion. Given the problems that
needs to be fixed, this approach seems to make sense.
But then at the editor’s team started discussing abandoning
this approach altogether. I said no, that is not reasonable. We
have discussed this for 6 months. There are a number of
properties to this solution that are the way they are because
of specific requirements and needs, which would not have been
addressed by the alternatives discussed during the meeting.
7 months and 3 presentations should be enough to get consensus
to at least try it. This hampers progress. Can we move forward
or do we need to abandon? If we can’t agree on anything on any
amount of time, then this working group is a failure. So can we
move forward?
Proposal: Instruct the editors team to merge the PR.
Bernard: From a process point of view, the WG already said to
move forward. It is the job of the editors team to implement
what the WG decided.
Jan-Ivar: But this hasn’t had a call for consensus. I think
some of this, I support solving. But there appears to be 3 use
cases: e2e encrypt, metadata to existing frames, and third is
codecs in JS. I think good arguments have been made that this
is not the right API for number three. And other proposals with
less methods were discussed in the issue.
Youenn: I think we agree that we should cover these issues and
I am really hoping that we can do it. I don’t have a strong
opinion if we should expose JS encoders and plug them into a
peer connection, we can investigate it, I think there are use
cases. But I think that belongs to a different spec about how
to expose encoders and packatizers. We could use such APIs in
totally orthogonal ways.
Philipp: Developers are already using transforms to solve real
world problems and the working group is not addressing the use
cases in a way that works better so developers will continue
what they are doing.
Harald: For the JS encoder, that part is not my favorite API
either - we should make webrtc encoded frames constructable,
which is a completely different part - but no matter what we do
on how to construct the frames, we still need to SDP negotiate.
This is the simplest possible API for this. Jan-Ivar’s
suggestion has obvious deficiencies and would probably take
another 7 months to discuss it. This is unreasonable. We are
better served by merging and iterating.
Jan-Ivar: I think my proposal is still an iteration on Harald’s
API. It’s just removing methods for use case 3, so I just
isolated it to the part that it should solve. I think it’s a
compromise we could work on.
Youenn: If the user agent does not know the encoder
Henrik: It seems like there is a lot of concern about the API
being misused. But I think we need to do all of these things
anyway, like if we’re transforming then it is no longer the
same PT and we are breaking SDP rules, and if we want to
forward frames then we need to say what the PT is etc. If this
API can be used for something else, and in the future there is
a better way to do that, then that’s good. I really don’t
understand why this is so controversial if we have to do
something like this anyway, and it doesn’t make sense to me
that we’re stalling for 6 months.
Jan-Ivar: But I hear consensus around some of the use cases,
but it seems very hacky to add some of these things.
Youenn: I agree we should fix E2E encryption. We just need a
single attribute. But if we can’t agree on that, well that’s
life. Taking a frame from one peer connection and putting it in
another peer connection is very similar to a pluggable encoder,
so exposing an encoder might make more sense to solve this
problem.
Henrik: Is the core of the disagreement that we’re enabling
pushing frames without reading from the source when the app is
in more control?
Harald: No this is about the SDP negotiation specifically.
Harald: What is the next step? Call for consensus?
Bernard: We can do a call for consensus. But in the past it has
resulted in a bunch of issues that can take months to resolve.
Jan-Ivar: It doesn’t seem like we have consensus.
Bernard: But it may help clarify what we don’t have consensus
about.
Jan-Ivar: If we can entagle these things for use case 1, 2 or 3
then maybe we can make progress.
Conclusion: Try to solve use case 1 and 2 first, and then
revisit use case 3 separately.
Received on Wednesday, 18 October 2023 10:03:26 UTC