[minutes] October 17 2023 meeting from Dominique Hazael-Massieux on 2023-10-18 (public-webrtc@w3.org from October 2023)

From: Dominique Hazael-Massieux <dom@w3.org>
Date: Wed, 18 Oct 2023 11:03:21 +0100
To: public-webrtc@w3.org
Message-ID: <f3613b4f-d0c4-4e0e-8137-0647c613665c@w3.org>
Hi,

The minutes of the WebRTC WG meeting held on October 17 are available at:
   https://www.w3.org/2023/10/17-webrtc-minutes.html

Thanks Henrik for scribing!

Dom

               WebRTC WG Teleconference minutes 2023-10-17

    Slideset:
    [1]https://lists.w3.org/Archives/Public/www-archive/2023Oct/att
    -0002/WEBRTCWG-2023-10-17.pdf

       [1] 
https://lists.w3.org/Archives/Public/www-archive/2023Oct/att-0002/WEBRTCWG-2023-10-17.pdf

    Scribe: Henrik

Congestion control (Harald)

    Network may be limited. Sending too much causes discards and is
    bad manners. The browser will police the sender.

    What about EncodedTransform? Without it, the transport
    estimates for you, telling the encoders what the target is. But
    the transform changes things: size changes. Therefore, the
    transform needs to know the target.

    Proposal: add cancellable events for significant changes in
    available BW.

    See slides for WebIDL and examples.

    Jan-Ivar: I don’t understand the use case. It seems to be that
    we’re expanding the role of the application from simply
    transforming to something more. The user agent should already
    see how much data was added by the transform, why would we need
    to add BW information and allow modifying that? Is it
    justified?

    Harald: Sending more data than there is room for is a bad
    thing, see previous slides. Letting the downstream decide what
    was added requires that the downstream can see both the
    incoming and outgoing sizes of the transform, and that the
    added outgoing information is consistent over time.

    Youenn: I can see that you would want to override the user
    agent. But I think the user agent knows the overhead that the
    transform is adding already. So it can do something. What we
    need to understand is in which situations, the user agent
    behavior is not good. A transform that is doing a lot of things
    can drop frames. I’m fine with letting the web page influence
    this, but I am not sure how. If this API will be easy to use by
    web developers. It’s not clear how practical BandwidthInfo. But
    I think it is worth continuing the investigation.

    Harald: So you think there is a case for onkeyframerequest,
    bringing that forward as a separate PR, correct?

    Youenn: I think so, we should discuss it, but I see the use key
    and it is straightforward boolean. That’s much easier than the
    BW info, since there are multiple parameters.

    Harald: So we have a controversial and non-controversial part,
    let’s separate. But in the case where the transform is not
    capable of doing the right thing is when the frame comes from
    another source. Because then the source might be in app
    control, but not under control of the encoder of the sender. So
    the sender might do everything it can of the encoder, but if
    the encoder is not the source of the frame, we’re in trouble.
    For that use case, we need something like this.

    Bernard: Question about the flow of the BW info. You get it via
    the onbandwidthestimate?

    Harald: We should fire events and let the user read the state
    of the event. You have to read BW info when you get the
    onbandwidthestimate.

    Bernard: So you make a copy and then call
    sendBandwidthEstimate, correct? So it’s not actually setting on
    the wire?

    Jan-Ivar: I’m trying to follow up on the sources. And part of
    my concern is that we don’t have consensus yet if this is the
    right API shape. But question about the events: so network can
    change, but one of the use cases is that metadata can change.
    Is this mean to be a signal that the user agent can use
    punitively on JS that is producing too much data? Or is this
    plain data?

    Harald: The sender is allowed to drop frames, that’s something
    we already agreed on. But this is giving the information up so
    that the app can adjust with a high probability that downstream
    does not later drop the frame. The BW can change fast of course
    so there is never a guarantee. But if the transform adds
    metadata about for example the silhouette, or standstill, that
    you use for background replace, and it knows that this
    information will att 1 KB to the frame, then the transform
    knows it is changing from not sending to sending this data,
    then it can proactively tell the encoder this, because I will
    now add more stuff to your frames. This is why you might need
    to set this even when there is no signal from the transform.

    Jan-Ivar: It might seem appropriate/useful to me to signal
    something about the caps to JS, but running events at 60 frames
    per second might seem unnecessary.

    Harald: A lot of the time there will not be any change, so I
    think firing events is appropriate since it will only fire some
    of the time. You could read it just before deciding what to do,
    and that is perfectly reasonable.

    Conclusion: Separate PR for key frame event. But maybe we don’t
    need an event for BW info, you can just read it. I can make
    those changes and come back in November.

Mediacapture-screenshare (Elad)

    Elad is not here. Skipping for now.

Mediacapture Extensions

    Henrik…

    We have video frame counters, we should similarly add audio
    frame counter for the same reasons, like calculating percentage
    of frames lost (e.g. detect audio glitches). But in the audio
    case it’s also very interesting for audio quality to know about
    capture delay, so we should also measure the delay between
    capture and the audio frames being delivered. (Arguably you
    might want this for video too, but so far nobody has asked for
    it). So here is the PR. We might want to modify it to say
    totalFrames instead of droppedFrames (you can calculate drop
    from total by subtracting delivered) as this would be more
    consistent with the audio stats. But in general can we move on
    and merge this PR based on following up on this in the editor’s
    meeting?

    Jan-Ivar: Paul’s not here but he put some comments on the issue
    that it would be great if you could look at. But overall I
    applaud the move and think this is good.

    Discussion around naming, clarifications around what delivered
    means. But overall approach is not controversial.

    Henrik: Delivered is when the frames are being handed off to
    the sinks. This is the exact same definitions as for the video
    frames.

    Jan-Ivar: But this isn’t observable.

    Henrik: No but it covers the part of the pipeline up to the
    sink. For example if there is a delay before that, and then you
    use a WebRTC peer connection, and the peer connection adds
    additional delay in encoding and sending, then WebRTC getStats
    would have to tell you about any additional delays there. So
    even though the exact delivery time is not observable, the
    capture delay is a hopefully a quite well understood concept if
    we clarify it, and since this is only an estimate anyway. I
    mean, if the user is experiencing 300 ms delay but the API says
    50 ms, then that’s clearly a bad implementation and we should
    file a bug to make the capture delay more accurate.

    Youenn: In the webrtc stats we talk about samples, maybe it
    would make more sense to talk about samples here too since
    audio and video is different, that may be what audio folks
    prefer.

    Henrik: Actually audio frames is what I was asked to use based
    on Paul’s input and is consistent with other audio APIs. Also
    on a historical note, the webrtc stats using samples was a
    mistake, and there’s actually an existing note about this
    explaining how the samples are normalized on number of audio
    channels. So the webrtc stats using samples is actually
    misleading and not what is actually measured there, so we
    should use frames.

    Conclusion: Overall approach makes sense, flesh out the details
    in the editors meeting

Grab bag: Racy devicechange event design has poor interoperability
(Jan-Ivar)

    Problem: enumerating devices can take 100+ ms, where the device
    change event and enumerating gets out of sync. It becomes hard
    to reason and trial and error coding, which eventually passes
    QA, but it could pass for unintented side effects and fail in
    other browsers.

    Proposal: Include devices as a parameter to the ondevicechanged
    event.

    Youenn: I’m wondering if we could try to deprecate enumerate
    devices, but anyway, we have talked about trying to be more
    explicit about why the event fires (Jan-Ivar: that’s the next
    slide).

    Harald: So this means that when you fire the event, you have
    already done the work to enumerate all devices, so it would
    probably fire later than today?

    Jan-Ivar: I think the way we have written the the algorithm
    that information should already have been acquired, but yeah,
    otherwise there would be a delay.

    Harald: I think that’s ok, it might lead to less firing events.

    Guido: I’m ok with the change too.

    Conclusion: No objection.

Grab bag: Should devicechange fire when the device info info changes?
(Jan-Ivar)

    The spec says to fire when the set of devices available to the
    user agent have changed. But user agents already lie about this
    and fire when mono changes. Or based on getUserMedia. So the
    question is, should we change the spec here or should we change
    Safari?

    Proposal A, B, C (see slides). I think we should go with
    proposal A which is no change.

    Youenn: You say it’s not web compatible, but somehow we shipped
    it, so it’s not clear it’s not web compatible. The spec is
    saying that the user agent if it has access to some devices,
    and you could see a world where the user agent does not have
    access to any devices until the OS has been prompted about
    wanting to use the devices, so I think in that sense Safari is
    following the spec.

    Jan-Ivar: Should we not have an event that makes auto switching
    easy?

    Youenn: Yes that is interesting and we could have an event that
    said why it fired, that might be much easier to understand to
    web developers. But if it can be solved with enumerateDevices,
    then that is fine as well. But I think Safari is following the
    spec.

    Jan-Ivar: Do you have a preference?

    Youenn: Hmm. (Needs to think)

    Guido: I’m more inclined to make the change more generic than
    devices available to the web page. But my main concern is what
    the current wording is about set of devices available changes
    to the user agent. What if the label changes? Is that a change
    to the list of the devices or not? I would like the event to
    fire if there is any change (such as label for the sake of
    argument) if the result changes. Anything that changes the
    result should fire an event. What do you think?

    Jan-Ivar: Is that proposal C?

    Guido: Well not necessarily. Because you focus on the case
    where when you call getUserMedia the set of devices changes.
    I’m not against firing it in that case, I’m inclined to fire on
    any change available to the web page. But what needs to be
    clarified is what a change to the set of devices means.

    Jan-Ivar: OS level changes for example? In Safari’s case it is
    the user agent changing the labels.

    Guido: So what does “set of media devices has changed” mean?
    One interpretation is that anything in the devices changed, is
    it the number of elements or is it any change in the elements
    that is already there? My main concern is that I want the event
    to fire if anything changes, not just the number of devices.
    Can we update the wording?

    Jan-Ivar: That might be OK, it’s probably quite rare, I’m not
    that concerned, but that would be OK. My main concern here is
    if Safari is right in firing the event.

    Henrik: Arguably the set of devices is not very relevant to the
    JS app, from the JS app the only thing that matters is what the
    result is and if that result changes. So if one browser changes
    the set of devices but another browser doesn’t, even though
    that is different behavior, it isn’t necessarily a web compat
    issue since only in the browser that something changed do you
    need the app to respond, so as long as the event firing is
    consistent in the browser it should hopefully make sense.

    Jan-Ivar: But the app may want to know if the user plugged in a
    device, and now it could fire in some browsers without that
    changing. Maybe prompt or not.

    Youenn: The user agent is in a good spot to think the user
    might want to switch. There are cases where you may or may not
    want to auto switch, for example the aidpods might get
    automatically connected by being placed closed to the macbook,
    so maybe we could expose more information to the app instead.

    Jan-Ivar: Perhaps we could have a device plugged event?

    Conclusion: More discussion needed, but we seem to agree on the
    problems we need to solve.

Exposing decode errors (Philipp Hancke)

    This was discussed at TPAC, generally in favor of exposing on
    the RTCRtpSender/RTCRtpReceiver rather than on the peer
    connection. Makes sense? -Yes.

    A PR is presented adding RTCRtpSenderErrorEvent extending
    Error.

    Henrik: Spatial index and encoding index is not the same thing.
    You could have an encoding with multiple spatial indices, e.g.
    L3T1, or you could have three encodings with L1T1, these are
    different things as we have one or multiple encoders.

    Philipp: But spatial index is already used in other places.

    Henrik: If so that is a mistake, I have been cleaning up the
    code base with a lot of places where these two things get mixed
    up, so we definitely should not duplicate this confusion to new
    APIs.

    Bernard: [2]WebCodecs issue 669 describes the approach in
    WebCodecs,, which is to use EncodingError for an issue with
    data (e.g. decoder can’t parse) and OperationError for resource
    issues.

       [2] https://github.com/w3c/webcodecs/issues/669

    Philipp: We specifically want to know about SW fallback.

    Youenn: Possible privacy issue. I think we should have a
    fingerprint on the PR, and ask the PING people.

    Florent: You can imagine an application abusing the different
    settings already if it wanted to do fingerprinting, so this is
    not necessarily unique.

    Youenn: If so we should add a note if we missed something
    earlier.

    Jan-Ivar: I’m confused, when should applications fire this?
    Only SW fallback, or other reasons?

    Philipp: There are multiple reasons for falling back to SW.
    That’s one of the main reasons for wanting SW fallback. For
    example HW queue gets overloaded.

    Jan-Ivar: It would be good to express what the app is expected
    to do in response to the event.

    Henrik: Is always SW fallback?

    Philipp: There can be cases where SW fallback.

    Conclusion: More clarification and discussion needed.

setCodecPreferences vs unidirectional codecs (Philipp Hancke)

    Some codecs, H264 profiles in particular, are send-only or
    receive-only. The algorithm says to look at send and receive
    directions, but it does not take the transceiver direction into
    account.

    We need to take directionality into account. But if we do we
    need to throw an exception in setCodecPreferences or setting
    direction if we get to an incompatible state.

    Youenn: Can it say “I want this codec but only for send”. Hm,
    no.

    Florent: Having the direction would make sense, I think a lot
    of issues comes from codecs that cannot be decoded. I wonder if
    we should have more restrictions in how we handle codecs. Maybe
    have send codecs be a subset of receive codecs. It would fix
    some issues.

    Harald: I think we should take directionality into account. I
    encountered some of the same problems when specifying the SDP
    negotiation PR we’ll get to later. We’ll need to look more into
    how JS has access to send and receive codecs separately.

    Henrik: We already have that, with sender and receiver
    getCapabilities.

    Bernard: I think in general it is ready for PR, then we can
    look at the PR and evaluate it.

    Florent: You can still run into issues later on. (?)

    Conclusion: Philipp will provide a PR

SDP codec negotiation (Harald)

    The codec information needs to be presented before the
    negotiation. What I proposed was to make it possible to inject
    names that the platform does not understand. There is no need
    to change the SDP rules, that’s important.

    Before sending data we have to choose which encoder to use. And
    if we transform the data we need to tell the transform what
    payload type to use, because it is different from the incoming
    PT. The new PT does not have to be understood by the platform
    (app level PT). Similarly on the decoder side you need to tell
    the depackatizer and decoder what PT to use. This PT the
    platform does have to understand.

    Adding an API for this plugs in a glaring hole. The issue we
    need to fix was presented in March, the first version of the PR
    was presented in June, with conclusion to adopt PR with details
    to be discussed. Presented again at TPAC, the summary said that
    there were arguments on both sides. Suddenly packetizers are up
    for discussion. So I have revised the PR based on my
    understanding of the TPAC discussion. Given the problems that
    needs to be fixed, this approach seems to make sense.

    But then at the editor’s team started discussing abandoning
    this approach altogether. I said no, that is not reasonable. We
    have discussed this for 6 months. There are a number of
    properties to this solution that are the way they are because
    of specific requirements and needs, which would not have been
    addressed by the alternatives discussed during the meeting.

    7 months and 3 presentations should be enough to get consensus
    to at least try it. This hampers progress. Can we move forward
    or do we need to abandon? If we can’t agree on anything on any
    amount of time, then this working group is a failure. So can we
    move forward?

    Proposal: Instruct the editors team to merge the PR.

    Bernard: From a process point of view, the WG already said to
    move forward. It is the job of the editors team to implement
    what the WG decided.

    Jan-Ivar: But this hasn’t had a call for consensus. I think
    some of this, I support solving. But there appears to be 3 use
    cases: e2e encrypt, metadata to existing frames, and third is
    codecs in JS. I think good arguments have been made that this
    is not the right API for number three. And other proposals with
    less methods were discussed in the issue.

    Youenn: I think we agree that we should cover these issues and
    I am really hoping that we can do it. I don’t have a strong
    opinion if we should expose JS encoders and plug them into a
    peer connection, we can investigate it, I think there are use
    cases. But I think that belongs to a different spec about how
    to expose encoders and packatizers. We could use such APIs in
    totally orthogonal ways.

    Philipp: Developers are already using transforms to solve real
    world problems and the working group is not addressing the use
    cases in a way that works better so developers will continue
    what they are doing.

    Harald: For the JS encoder, that part is not my favorite API
    either - we should make webrtc encoded frames constructable,
    which is a completely different part - but no matter what we do
    on how to construct the frames, we still need to SDP negotiate.
    This is the simplest possible API for this. Jan-Ivar’s
    suggestion has obvious deficiencies and would probably take
    another 7 months to discuss it. This is unreasonable. We are
    better served by merging and iterating.

    Jan-Ivar: I think my proposal is still an iteration on Harald’s
    API. It’s just removing methods for use case 3, so I just
    isolated it to the part that it should solve. I think it’s a
    compromise we could work on.

    Youenn: If the user agent does not know the encoder

    Henrik: It seems like there is a lot of concern about the API
    being misused. But I think we need to do all of these things
    anyway, like if we’re transforming then it is no longer the
    same PT and we are breaking SDP rules, and if we want to
    forward frames then we need to say what the PT is etc. If this
    API can be used for something else, and in the future there is
    a better way to do that, then that’s good. I really don’t
    understand why this is so controversial if we have to do
    something like this anyway, and it doesn’t make sense to me
    that we’re stalling for 6 months.

    Jan-Ivar: But I hear consensus around some of the use cases,
    but it seems very hacky to add some of these things.

    Youenn: I agree we should fix E2E encryption. We just need a
    single attribute. But if we can’t agree on that, well that’s
    life. Taking a frame from one peer connection and putting it in
    another peer connection is very similar to a pluggable encoder,
    so exposing an encoder might make more sense to solve this
    problem.

    Henrik: Is the core of the disagreement that we’re enabling
    pushing frames without reading from the source when the app is
    in more control?

    Harald: No this is about the SDP negotiation specifically.

    Harald: What is the next step? Call for consensus?

    Bernard: We can do a call for consensus. But in the past it has
    resulted in a bunch of issues that can take months to resolve.

    Jan-Ivar: It doesn’t seem like we have consensus.

    Bernard: But it may help clarify what we don’t have consensus
    about.

    Jan-Ivar: If we can entagle these things for use case 1, 2 or 3
    then maybe we can make progress.

    Conclusion: Try to solve use case 1 and 2 first, and then
    revisit use case 3 separately.
Received on Wednesday, 18 October 2023 10:03:26 UTC