[minutes] April 18 2023 meeting

Hi,

The minutes of our meeting on April 18 2023 are available at:
   https://www.w3.org/2023/04/18-webrtc-minutes
with the YouTube recording at https://www.youtube.com/watch?v=VkBeQdbVjWs

Thanks Henrik for scribing!

Dom


                           WebRTC WG 2023-04-18

    [2]Agenda.

       [2] https://www.w3.org/2011/04/webrtc/wiki/April_18_2023

Attendees

    Present
           -

    Regrets
           Dom, Harald

    Chair
           Bernard, Jan-Ivar

    Scribe
           scribe

Contents

     1. [3]PR 147: Add RTCRtpEncodingParameters.codec to change the
        active codec (Florent)
     2. [4]PR Stats/751: RTX and FEC stats are incomplete (Fippo)
     3. [5]Issue 146: Exposing decode errors / SW fallback as an
        Event (Bernard)
     4. [6]Issue 170: Incompatible SVC Metadata (Bernard)
     5. [7]Issue 93: MediaStreamTrack audio delay/glitch capture
        stats (Henrik)
     6. [8]PR 173 adding presentationTime to
        RTCEncodedVideoFrameMetadata (Tony)
     7. [9]WebRTC Combines Media and Transport
     8. [10]WebRTC ICE improvements
     9. [11]playoutDelay (Jan-Ivar)
    10. [12]Issue 39: Solve user agent camera/microphone
        double-mute
    11. [13]Summary of resolutions

Meeting minutes

    Recording: [14]https://www.youtube.com/watch?v=VkBeQdbVjWs

      [14] https://www.youtube.com/watch?v=VkBeQdbVjWs

    IFRAME:
    [15]https://www.youtube.com/embed/VkBeQdbVjWs?enablejsapi=1&rel
    =0&modestbranding=1

      [15] 
https://www.youtube.com/embed/VkBeQdbVjWs?enablejsapi=1&rel=0&modestbranding=1

    Date: April 18, 2023

    Slideset: [16]https://lists.w3.org/Archives/Public/www-archive/
    2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf

      [16] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf

   [17]PR 147: Add RTCRtpEncodingParameters.codec to change the active
   codec (Florent)

      [17] https://github.com/w3c/webrtc-extensions/pull/147

    [18][Slide 12]

      [18] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf#page=12

    Florent: We failed to mention that this API could also be used
    for audio. We intend this to also be audio. If you have any
    objections please mention them now.

    Jan-Ivar: Simulcast on audio is probably not on anyone’s table.
    The use case for audio here is to not have to negotiate, just
    want to clarify that.

    If people use this API, what codec can they choose from? Can
    they go outside what was negotiated?

    Florent: No, only what is negotiated. Any negotiated codec
    supported by the user agent.

    Peter: If someone asks for something that is not negotiated,
    what happens?

    Florent: An exception is thrown, it’s part of the PR.

    RESOLUTION: No objection, let’s include audio.

   [19]PR Stats/751: RTX and FEC stats are incomplete (Fippo)

      [19] https://github.com/w3c/webrtc-stats/issues/751

    [20][Slide 13]

      [20] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf#page=13

    Fippo: Outbound has RTX, not inbound. Inbound has FEC, outbound
    does not except in provisional stats. Proposal: to add metrics
    for both sides and merge once there is an implementation.

    Jan-Ivar: No objection, but the slide is asking for more than
    what is discussed on the slide. Is that correct? WebRTC stats
    is in Candidate Recommendation. If not implemented we put them
    in provisional. Are you providing implementations?

    Fippo: Yes I’m providing implementations.

    Jan-Ivar: No objection.

    RESOLUTION: No objection, Fippo to make PRs: merge on main spec
    if implementation is provided or provisional spec if no
    implementation is provided.

   [21]Issue 146: Exposing decode errors / SW fallback as an Event
   (Bernard)

      [21] https://github.com/w3c/webrtc-extensions/issues/146

    [22][Slide 14]

      [22] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf#page=14

    Bernard: In Media WG a useful distinction was made between data
    errors and resource problems. HW encoders are more restrictive
    than SW. Nothing to show the user but the developer has to
    debug the bitstream.

    If there is a resource problem we may want to prompt the user.
    Application might re-acquire resources.

    That’s my recommendation here. What do people think about
    distinguishing these two?

    Youenn: We already support SW fallback today, so that seems
    good as-is. I don’t see any reason to notify the app. For
    resource problem is seems similar to “intense CPU usage”. In
    Safari we have a UI to show this, so again there I don’t see
    what the application can do to help the user that the browser
    cannot already do instead.

    Jan-Ivar: Thumbs up.

    Florent: There is also the case of the SW decoder not being
    able to decode the data. That also needs to be handled.

    HW to SW fallback: what if application wants to change codec
    instead?

    Jan-Ivar: It’s unclear to me if the only option to fall back to
    SW. We should be clear about what problem we’re trying to
    solve.

    Henrik: The distinction makes sense but I don’t see how this
    slide solves the original problem of letting the app change
    codec if it doesn’t get HW if you’re saying SW fallback solves
    the problem I don’t agree.

    Bernard: An event that distinguishes between the two is the
    proposal, app observable. In the Media working group we want to
    get this information out of WebCodecs. Just to be clear this is
    under investigation, it’s not perfect, but Eugene is trying to
    address this.

    Jan-Ivar: Because of the privacy issue I think we [are
    hesitant]. (Scribe: not sure what was said here.)

    Bernard: The benefit of this approach is that we don’t need
    more info than this. It’s two bits of information.

    Youenn: The privacy concerns is real.

    RESOLUTION: Waiting for Media Working Group.

   [23]Issue 170: Incompatible SVC Metadata (Bernard)

      [23] https://github.com/w3c/webrtc-encoded-transform/issues/170

    [24][Slide 15]

      [24] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf#page=15

    [25][Slide 16]

      [25] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf#page=16

    [26][Slide 17]

      [26] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf#page=17

    [27][Slide 18]

      [27] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf#page=18

    Bernard: What metadata do you need to handle things? Dropping
    spatial layers is easy but adding them back is not easy because
    of the way SVC works. Let me show a slide: diagram that shows
    how a frame may not be decodable if a dependent frame was not
    received for some reason. The same problem can happen if there
    is a dependency between multiple frames, the first frame could
    be lost. Need an unbroken chain of dependencies.

    Bernard: Another problem is if a mobile device can receive all
    frames, but may not be able to decode it because the frames are
    too big to be able to handle.

    Bernard: Receiver needs to quickly be able to decode the frames
    and if a receive frame is necessary for a desired resolution

    [28][Slide 19]

      [28] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf#page=19

    [29][Slide 20]

      [29] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf#page=20

    Bernard: We have a PR for EncodedVideoFrameMetadata for SVC
    (see slide and link to detailed proposal).
    … Peter investigated state in Chromium for WebCodecs.

    Peter: Not all codecs implemented all of this. I don’t think
    there was anything in there for chains.

    Bernard: Question is why this stuff is in encoded transform if
    it has not been implemented? Should we remove them?

    Florent: Temporal and spatial index is implemented but it might
    depend on the codec implementation. E.g. it could be in the VP8
    frame descriptor but not in the AV1 bitstream but if you
    provide it you should be able to get most of this. That doesn’t
    mean there are no bugs. I am working on some tests for SVC in
    Chrome.

    Bernard: So we have this incompatibility between WebCodecs and
    WebRTC metadata. How to make these compatible?

    SUMMARY: Needs further investigation?

   [30]Issue 93: MediaStreamTrack audio delay/glitch capture stats
   (Henrik)

      [30] https://github.com/w3c/mediacapture-extensions/issues/93

    [31][Slide 21]

      [31] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf#page=21

    Henrik: The following audio capture metrics were recently added
    to getStats. But issues were filed and the metrics have been
    marked Feature at risk. The following metrics were added (see
    slide). The metrics are only applicable if the media source
    represents an audio capture device. They allow calculating
    important quality metrics not available elsewhere such as
    glitches in audio or average capture delay.

    [32][Slide 22]

      [32] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf#page=22

    Henrik: [33]w3c/webrtc-stats#741 is that GUM is frequently used
    without WebRTC, we should talk about audio frames rather than
    audio samples. The stats spec actually means audio frames as
    clarified in the terminology section. And lastly it was not
    clear why audio may be dropped so we need to clarify this
    happens when audio is not processed in a timely manner from the
    real time audio device.

      [33] https://github.com/w3c/webrtc-stats/issues/741

    Henrik: So the main issue, other than clarifications, is that
    the metrics are in getStats() but they may be useful outside of
    WebRTC. The proposal is to move them to MediaStreamTrack. This
    is similar to how we recently added video capture stats to
    track.getFrameStats(). The second point is name bikeshedding,
    if we should use the getFrameStats, rename it or add a new
    method specific to audio capture stats.

    Youenn: Makes sense. No strong opinion on names.

    Jan-Ivar: Correct direction. I’m not sure why we didn’t just
    call this track.getStats().

    RESOLUTION: Moving them to MediaStreamTrack makes sense, Henrik
    to write a PR.

   [34]PR 173 adding presentationTime to RTCEncodedVideoFrameMetadata
   (Tony)

      [34] https://github.com/w3c/webrtc-encoded-transform/pull/173

    [35][Slide 23]

      [35] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf#page=23

    Tony: The PR suggests a presentation the two timestamps don’t
    match.

    Bernard: Any objections?

    Jan-Ivar: We’re overall supportive, but there is a comment
    about naming bikeshedding.

    Bernard: We’re heading down a road where we’re recreating
    WebCodecs.

    Jan-Ivar: No objection that we need it but name needs to be
    figured out.

    Tony: Let’s continue on the PR.

   WebRTC Combines Media and Transport

    [36][Slide 26]

      [36] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf#page=26

    [37][Slide 27]

      [37] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf#page=27

    Bernard: WebRTC combines media and transport. Good and bad.
    Good if it does what you want, but bad if you’re trying to do
    different things like:
    … * Some things can’t be expressed in SDP.
    … * Difficult to support bleeding edge codecs
    … What if we could separate media and transport?

    [38][Slide 28]

      [38] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf#page=28

    Peter: Tightly coupled, but what if we could cut in two? Then
    the app can sit between “media” and “transport” parts and
    tweak. See slide.
    … The left part looks a lot like WebCodecs. The right part:
    RtpTransport?
    … RTP sender API is “stream of packets in; stream of packets
    out on wire”. RTP receiver is “stream of packets received on
    the wire, stream of packets out”. RtpTransport.
    … Here is how we could allow constructing these independently
    of a PeerConnection: (slide).
    … This would allow the app to bring its own packetization,
    jitter buffer and rate adaptation.
    … This is similar to WebTransport with datagrams. But it can
    also be P2P and latency-sensitive congestion control.
    RtpTransport would solve the following use cases:
    … NV07, NV08, NV09, NV15, (non-consensus ones) NV38, NV39,
    NV40,
    … See slides with WebIDL.
    … If we wanted to do jitter buffers there is more we could do,
    but focus right now is RtpTransport. Is this a good direction
    to go, giving the application more controls for transport?

    Jan-Ivar: NV use cases kind of pre-dates WebTransport. I think
    a lot of these use cases have already been met. Some of these
    use cases are not actually use cases but requirements for use
    cases. I feel like I’ve already objected to this.

    Bernard: No, the NV use cases not labeled “non-consensus” ones
    don’t have consensus but the other ones do have WG consensus.

    Jan-Ivar: There are now two implementations of WebTransport.
    I’d like to push back. It seems a bit early to discuss WebIDL.

    Youenn: We see a lot of people trying to push a lot of stuff.
    There is a desire to have an ICE transport. I think it is the
    right approach. Are there sufficient use cases to warrant this?
    That is a good question, and we should spend some time there.
    But if we can gather enough use cases…

    Florent: Regarding overlap with WebTransport, I don’t believe
    it will work in most cases that involve peer to peer, since
    WebTransport only works between browser and server, not browser
    and browser.

    Randell: I think this is interesting. I have concerns similar
    but not identical to Jan-Ivar. It does seem easy for something
    to violate bandwidth constraints. Perhaps there would be
    mechanisms in place to block that somehow. It does smack me as
    being very low level and it is inviting applications to
    implement their own implementation of WebRTC. Doing it is one
    thing, but doing it well is not so easy. I am a little
    concerned about that, so I think it would be worth discussing a
    little more what use cases this would enable and what
    alternatives there might be to resolve those use cases. I have
    some reservations about the details here. There are two issues
    with WebTransport: one, as Florent mentioned, it is only client
    to server. That’s not to say it couldn’t be, but that would be
    additional work to get there. The second issue is that
    congestion control currently implemented for WebTransport is
    not real-time friendly, but that could be fixed with a “I want
    real-time congestion” flag.

    Peter: I’m a fan of WebTransport and I would like to see it
    solve both P2P and CC. But it will never have the same interop
    with endpoints. So even if WebTransport becomes everything we
    want, I think there will still be a need for web apps to have
    RTP controls. The reason I mentioned jitter buffer and
    packetization is that these are the main thing for somebody to
    implement. A high quality jitter buffer is not an easy task. I
    have a proposal for actually. Part of the detailed design I
    have for this addresses how the browser could stop the
    application from screwing things up. Related to that, sequence
    number reuse I do have a solution for that too. But the overall
    question is if we want to have these discussions?

    Randell: All things being equal, if I had a choice, I would
    prefer to solve this via WebTransport. Perhaps there are some
    cases… but I’d rather solve it if we can. I want to get
    clarification if this cannot be solved via WebTransport for the
    use cases we decide care about.

    Bernard: The IETF will not use WebTransport for peer-to-peer.
    This working group decided not to work on QUIC. Anyway the
    protocol question is for IETF and it is not clear that they
    will solve this.

    Jan-Ivar: This would only provide a data channel, which we
    already have. So are there peer-to-peer use cases that are not
    already solved through existing technology? You can already use
    VideoTrackGenerator to send data in this. If there are not
    enough use cases this sounds like a premature optimization.

    Peter: Regarding use cases - we’ve been talking about them
    forever. And we’ve never gotten around to making solid
    proposal. It has been far too long, this is a shame. We’re just
    talking about talking about talking about use cases. What I’m
    proposing is a solution to all of these and more. We’re not
    getting anywhere talking about use cases.

    Youenn: There was talk about being able to replace WebRTC. It
    would be great to collect exactly what people are trying to
    solve. If we have enough RTP use cases we should solve them
    instead of trying to solve them via other APIs. That should not
    be the path forward.

    Bernard: This would need some time. We might want to consider
    this for TPAC or future meetings.

   WebRTC ICE improvements

    Sameer: Continuing our discussions on ICE improvements.
    … Last time we had 3 proposals. Feedback: Can we split this
    into increments? This is what Peter and I have been trying to
    do. Harmonize into a single proposal with a lot of common
    grounds. Meets all the NV requirements.
    … Order of incremental improvements…
    … 1-3: candidate pair controls
    … 4-5: low latency
    … 6-9: ICE connectivity checks, controlling timing and
    observing checks
    … 10-11: candidate gathering
    … 12-13: prevent premature pruning or custom pruning
    … 14-15: ICE without peer connection and forking
    … Peter will talk about the API shapes.

    Peter: Cancellable events versus direct control: A and B on the
    slides. For today just pick the option you like, we can decide
    that later, judge the API as a whole based on the one you like.
    … Lots of WebIDL. See slides!

    Bernard: If you’re passing an ICE gatherer, I guess you’re
    max-bundle?

    Peter: If you’re willing to do ICE forking, you’re probably
    willing to do max-bundle.

    Jan-Ivar: I do think we want to go in this direction. When it
    comes down to cancellable events, I think they’re semantics may
    make sense. There is a valuable pattern. On the other hand if
    the application wants to remove a pair not in response to
    something, there is value to that too. So if you could agree.
    … It’s hard to imagine how the JavaScript would look when using
    all of these, but it does seem to me like a lot of defining
    custom events. I imagine we only need to create custom events
    in certain cases.

    Peter: It might be possible to instead have attributes or
    something.

    Jan-Ivar: But overall it’s a good direction.

    Youenn: It’s a long list of interfaces, so I cannot sign on all
    interfaces. But it is good to have a path forward that seems
    pretty clear. I am hoping that the first API bits are the ones
    that developers will use first, because then we can start
    prototyping and shipping more. If the first three adds value we
    have more motivation to implement more. So let’s dig into that
    and start being nit-picky about the design and so on.

    Peter: We made the list in the order that we think the
    developers are asking for the most. One option is to really
    nail down those three and then work from there.

    Sameer: Regarding how does the JS look, for the new proposal I
    don’t have a full example yet, but on my github I do have an
    example of how the old API being used look. It is slightly
    different of course, but it should give some view of how this
    might look like in general.
    … Regarding cancellable events versus the other approach: we
    would still have explicit methods for specific actions. The
    difference is only between how to affect the default behavior:
    do you decide at one time or do you decide when the event is
    firing?

    Jan-Ivar: Reactionary versus method seems different uses.

    Peter: The method to remove a candidate pair, for example,
    exists in both proposals. The difference is just with how to
    prevent the default action.
    … Examples. Get down and dirty.

    RESOLUTION: The working group supports the direction this is
    taking. Sameer/Peter to create PRs to solve things piece by
    piece, starting the the most important use cases (first few
    bullet points). As we progress we’re expecting to see reason to
    continue down the list and spec + implement more.

   [39]playoutDelay (Jan-Ivar)

      [39] https://github.com/w3c/webrtc-extensions/issues/156

    [40][Slide 81]

      [40] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf#page=81

    Jan-Ivar: Chrome has implemented this as “playoutDelayHint”.
    Multiple issues have been raised. Firefox is now trying to
    implement and address issues. Questions:
    … * Jitter buffer delay OR jitter buffer plus playout delay?
    … * Milliseconds versus seconds?
    … * How to test?
    … * The positive goal should be jitter-free media.
    … Delay is a measure of a negative side-effect, it is vague. It
    makes it hard to test and confusing for implementers. Chrome is
    inconsistent.

    [41][Slide 82]

      [41] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf#page=82

    Jan-Ivar: Proposal: jitterBufferDelay = value in milliseconds.
    Let the application compare this to getStats.

    Henrik: I like everything you are presenting and think this is
    how we should have done it in the first place. But I just want
    to point out that I think the jitter buffer issue in Chrome is
    a bug in getStats() rather than a bug in playoutDelay, so if
    this is essentially already implemented under a different name,
    is it really worth the migration effort to change name versus
    changing the spec?

    Jan-Ivar: It also says to throw an exception which Chrome
    doesn’t do. If you set this to 20 seconds that would change
    what this API can be used for. WebRTC is used for real-time and
    20 seconds is not real time.

    Henrik: My understanding is that Chrome clamps. So if you set
    to 20 it might not throw but you still only get 4 seconds, or
    whatever the max value is. I think. I’m not saying it’s better
    but overall it seems like the difference between the spec and
    the implementation is rather nit picky and I’m wondering if it
    is worth having to migrate.

    Jan-Ivar: But hint sounds like this is optional. We want to
    have control surfaces that we can test.

    Henrik: I agree it’s just if it’s worth the extra effort. I
    mean, I’m not going to object if we go down this route but. I
    have a feeling this will come back on my table. Heh.

    Jan-Ivar: Anyone else? So is that it?

    RESOLUTION: Let’s go in Jan-Ivar’s direction. Jan-Ivar to make
    PRs.

   [42]Issue 39: Solve user agent camera/microphone double-mute

      [42] https://github.com/w3c/mediacapture-extensions/issues/39

    [43][Slide 85]

      [43] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf#page=85

    Youenn: The user agent can pause the camera in Safari. This
    affects the muted state. But applications tend to implement
    their own mute function, so it would be good if there was a way
    for the website and UA to sync.

    [44][Slide 86]

      [44] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf#page=86

    Youenn: OS side indicator is widely deployed in the UX. So when
    websites really want to mute they tend to stop() the track
    rather than mute.
    … OS level microphone: application tends to not stop() the
    track in this case for speech detection (“are you speaking?”
    hint)
    … Proposal: allow application to request to mute/unmute
    capture.

    [45][Slide 87]

      [45] 
https://lists.w3.org/Archives/Public/www-archive/2023Apr/att-0001/WEBRTCWG-2023-04-18.pdf#page=87

    Youenn: It could be placed on the MediaStreamTrack,
    InputDeviceInfo or Navigator. I would personally prefer on the
    MediaStreamTrack or InputDeviceInfo.
    … Thoughts?

    Jan-Ivar: I’m supportive but there are some privacy concerns. I
    think the privacy concerns need to be met which I think means
    requiring user activation. My preference would be to put it on
    the track. Bikeshed preference would be to simply call it
    mute() instead of requestMute().

    Youenn: Maybe muting happened a day ago, the user may need to
    accept prompt again. This is why it returns a promise.

    Jan-Ivar: What is the use case of mute?

    Youenn: People tend to clone tracks, what you actually want to
    do is mute the source, it is error prone to hunt down on all
    tracks.

    Jan-Ivar: If I’m transferring a track and there is a clone, I
    could also request to mute?

    Youenn: Muting on one could mute all clones. If one website
    wants to mute it is good if we could do that roughly at the
    source level. Hence the InputDeviceInfo proposal.

    Jan-Ivar: I want on the track, and focus on mute. Not as clear
    about unmute.

    RESOLUTION: No objections.

Summary of resolutions

     1. [46]No objection, let’s include audio.
     2. [47]No objection, Fippo to make PRs: merge on main spec if
        implementation is provided or provisional spec if no
        implementation is provided.
     3. [48]Waiting for Media Working Group.
     4. [49]Moving them to MediaStreamTrack makes sense, Henrik to
        write a PR.
     5. [50]The working group supports the direction this is
        taking. Sameer/Peter to create PRs to solve things piece by
        piece, starting the the most important use cases (first few
        bullet points). As we progress we’re expecting to see
        reason to continue down the list and spec + implement more.
     6. [51]Let’s go in Jan-Ivar’s direction. Jan-Ivar to make PRs.
     7. [52]No objections.

Received on Tuesday, 25 April 2023 15:13:45 UTC