[minutes] WebRTC WG August 2024 meeting from Dominique Hazael-Massieux on 2024-08-28 (public-webrtc@w3.org from August 2024)

From: Dominique Hazael-Massieux <dom@w3.org>
Date: Wed, 28 Aug 2024 11:15:41 +0200
To: public-webrtc@w3.org
Message-ID: <5a8544d3-134c-4e4e-bff1-692f7d043000@w3.org>
Hi,

The minutes of our August meeting held yesterday (Aug 27) are available at:
   https://www.w3.org/2024/08/27-webrtc-minutes.html

and copied as text below.

Dom


                      WebRTC August 27 2024 meeting

27 August 2024

    [2]Agenda. [3]IRC log.

       [2] https://www.w3.org/2011/04/webrtc/wiki/August_27_2024
       [3] https://www.w3.org/2024/08/27-webrtc-irc

Attendees

    Present
           Alfred_Heggestad, Bernard, Carine, Dom, Elad, Florent,
           Frederick_Google, Guido, Harald, Henrik, Jan-Ivar,
           JohannesKron, Lucia_Google, Markus_Handell,
           PatrickRockhill, PeterT, Sameer, SunShin, TimP, Tove,
           Varun_Singh, Youenn

    Regrets
           -

    Chair
           Bernard, HTA, Jan-Ivar

    Scribe
           dom

Contents

     1. [4]Captured Surface Control
     2. [5]Moving Forward with Mute
     3. [6]Speaker selection
          1. [7]Issue #142 / PR #143 Why prompt for a subset of
             stored speakers or speakers setSinkId already accepts?
          2. [8]Issue #133: The first "audiooutput" MediaDeviceInfo
             returned from enumerateDevices() is not the default
             device when the default device is not exposed
     4. [9]RTCRtpEncodingParameters: scaleResolutionTo
     5. [10]RTCRtpParameters.codec matching is probably too strict
     6. [11]Summary of resolutions

Meeting minutes

    Slideset: [12]https://lists.w3.org/Archives/Public/www-archive/
    2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf

      [12] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf

    Bernard: TPAC is ahead of us - please send request for agenda
    time, takind advantage of the longer meetings we'll have there

   [13]Captured Surface Control

      [13] https://github.com/screen-share/captured-surface-control

    [14][Slide 11]

      [14] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=11

    [15][Slide 12]

      [15] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=12

    [16][Slide 13]

      [16] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=13

    [17][Slide 14]

      [17] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=14

    [18][Slide 15]

      [18] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=15

    [19][Slide 16]

      [19] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=16

    [20][Slide 17]

      [20] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=17

    Jan-Ivar: the capture wheel solution looks promising, I'm
    supportive; couldn't we use it for zoom as well, through the
    preview tile with some browser controls?
    … re zoom level, would there be an opportunity to give feedback
    on the API shape? e.g. use an attribute instead of a method
    … re transient activation, would it be consumed? would this
    through a button?

    Elad: for instance, but it would vary across apps

    Jan-Ivar: why a 0-100 integers rather than floating point?

    Elad: it matches what browsers show in their UI; also helps
    with other UI (e.g. drowdown, slider, radio buttons) which is
    also why we want to leave the UI to the app

    Jan-Ivar: I still would prefer to use the same solution for
    zoom; does the zoom affect only the capture or also the
    original doc?

    Elad: also the original document

    Youenn: I discussed this internally; being able to send
    commands to another app breaks a pretty high security boundary,
    which got pushback
    … +1 on consuming user activation
    … re scrolling - how should this work on touch devices (e.g.
    ipad)? limiting this to "wheels" isn't ideal

    Guido: scrolling might be a better name indeed
    … we could limit this to a browser surface for the time being
    and leave it window to a later iteration

    youenn: in terms of UX, either you embed everything in the
    capturing app, or you leave the capturing app aside
    … in the latter case, managing scrolling is of less interest

    Elad: yes, but that pattern doesn't work across all apps/UXes
    … there is finite real estate on the screen to make use of

    youenn: this is an area of experimentation, e.g. macos provides
    new options in this space
    … but in general, having inconsistent behavior across
    browser/non-browser apps would be un-optimal
    … conversely, if the plan is to integrate both, we need to
    understand how that would work and if that could work

    Guido: how about to start with tab?

    Youenn: tab is interesting, but if we limit ourselves to tab,
    this isn't necessarily the best API

    Elad: but shipping tab would be a good way to validate the
    interest before we invest in the more complicated space for
    "window" (which requires different OS adaption and different
    security barriers)

    Jan-Ivar: re transient activation, it doesn't resolve the
    remote attack - e.g. setting a very high zoom would confuse the
    user
    … hence why I would prefer the wheel approach
    … the PiP button in the media element in FF could serve an
    example of a browser-provided UI

    Elad: so I hear support for send wheel from Jan-Ivar

    Youenn: on our end, feedback is negative at the moment - having
    something that keeps more control under the user agent would be
    preferable

    Elad: does that apply if we only do tabs?

    Youenn: not currently opposed to tabs, but it remains that the
    more control left in the UA, the better

   [21]Moving Forward with Mute

      [21] https://github.com/w3c/mediacapture-main/issues/982

    [22][Slide 20]

      [22] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=20

    [23][Slide 21]

      [23] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=21

    [24][Slide 22]

      [24] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=22

    [25][Slide 23]

      [25] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=23

    [26][Slide 24]

      [26] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=24

    [27][Slide 25]

      [27] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=25

    [28][Slide 26]

      [28] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=26

    Youenn: track.muted means no frame, not black frame - we should
    decide first what to do with black frames
    … this is JS-observable
    … in Safari, there will no rfvc callback from a muted track
    … we should have a consistent implementation

    guido: not opposed to that, but the spec currently supports
    including black frames in muted

    youenn: so let's try to converge on muted = no frame

    guido: the goal would be to transition existing apps to the new
    attribute, and then frame counter

    youenn: I'm not sure Safari would implement this, but this may
    not impact compat
    … re "isSendingFrames = false", it would be best to use
    "isNotSendingFrames" for compat with UA that wouldn't implement
    it
    … if the source is generating black frames, I'm happy for them
    to have a counter

    Bernard: I share some of Youenn's concerns
    … originally, we did say that black frames would be sent on
    muted, but I don't think we thought this through the whole
    system
    … inferring muted from seeing black frames feel like it may
    generate many interop issues across many APIs
    … Why did we decide to send blackframe (vs not sending)?

    HTA: sending a single black frame to replace the content of a
    muted stream would be sufficient, but the spec allows to
    continue sending black frames

    Jan-Ivar: I appreciate the migration path you've identified; +1
    to using the negative form, and maybe not "sending", but e.g.
    "producing"

    Guido: happy to bikeshed if there is interest in the direction

    Jan-Ivar: adding 3 stats feel a bit excessive; maybe we can
    count which of the frames are black

    Youenn: safari only send black frames on a peerconnection
    (maybe mediarecorder)
    … it's a on consumer basis

    [29]JSFiddle exploring what happens on mute

      [29] https://jsfiddle.net/jib1/cfcoqdwz/12/

    Guido: the goal is to simplify the spec by removing the
    flexibility the spec currently allows
    … so that mute becomes more useful with better interop

    TimP: if you use stats, everyone is already using polling

    Guido: the goal is to have a smooth migration path, with
    clarity that it will be deprecated later

    Henrik: I think the boolean is needed for the migration path;
    isMuted stops the counter increment in Chrome IIRC

    Guido: I'll start a PR to iterate on this

    Youenn: I'll file an issue to get us to converge on muted=no
    frame

   [30]Speaker selection

      [30] https://github.com/w3c/mediacapture-output/

     Issue [31]#142 / PR [32]#143 Why prompt for a subset of stored
     speakers or speakers setSinkId already accepts?

      [31] https://github.com/w3c/mediacapture-output/issues/142
      [32] https://github.com/w3c/mediacapture-output/issues/143

    [33][Slide 30]

      [33] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=30

    youenn: this seems fine to me; small caveat: all output devices
    exposed in enumerateDevices vs only output speaker associated
    with a microphone in getUserMedia
    … PR [34]#143 is fuzzy about that - not sure if you mean the
    restricted or broader scope for getUserMedia
    … maybe a note to be explicitly this is only for the speaker
    tied to a microphone exposed via gUM

      [34] https://github.com/w3c/mediacapture-output/issues/143

     Issue [35]#133: The first "audiooutput" MediaDeviceInfo returned
     from enumerateDevices() is not the default device when the default
     device is not exposed

      [35] https://github.com/w3c/mediacapture-output/issues/133

    [36][Slide 31]

      [36] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=31

    [37][Slide 32]

      [37] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=32

    Youenn: if we already have an audio output entry, it means
    we're already out of passive fingerprinting - we could expose
    the "real" deviceId of the default?

    Jan-Ivar: setSinkId("") has different semantics from
    setSinkId("the-actual-deviceid-of-the-default")

    Youenn: indeed, the latter wouldn't change if the default
    changes
    … OK, I'm fine with either proposals, with a bit of a
    preference with the non-empty string solution

    Guido: UA & System defaults aren't the same
    … system default maps to what the underlying platform calls
    system default
    … default is different semantically from the specific deviceid
    currently the default
    … the UA might have a different default than the OS, that would
    track a different device than the system
    … I think we need to be more specific about what we mean by
    system-default device (the one we use "default" for in
    Chromium)
    … I'm partial to proposal B to avoid overloading the meaning of
    empty string

    Jan-Ivar: the spec only talks about system-default, not about
    UA-default; I'm not aware of any UA with a default speaker

    Youenn: I agree with Guido there is a difference

    Harald: "default" is a tricky concept; windows had two default
    devices (one of telephony, the other for general audio)
    … referring to a UA default might make more sense since
    system-default isn't a well-defined term

    Jan-Ivar: the empty string is already identified as dynamically
    following the system-default

   [38]RTCRtpEncodingParameters: scaleResolutionTo

      [38] https://github.com/w3c/webrtc-extensions/issues/159

    [39][Slide 35]

      [39] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=35

    [40][Slide 36]

      [40] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=36

    Jan-Ivar: this SGTM; I would use our own dictionary, and find a
    better name than rect

    Henrik: e.g. resolution

    Jan-Ivar: re aspect ratio, what you propose seems to match what
    we do for constraint, I like that
    … my only question is if the UA could do it on its own without
    new API

    Henrik: I don't think it's possible, it's inherently racy and
    buffers makes it even more uncertain

    Youenn: this is maxWidth and maxHeight really?

    Henrik: yes, we can call it that

    Jan-Ivar: what happens if the aspect ratio set by width &
    height is different from the source?

    Henrik: it will make it fit in the specified width & height

    Florent: what happens if either width or height isn't
    specified?

    Henrik: I think we should require them both

    Florent: that might help deal with aspect ratio issues

    Henrik: but that breaks the orientation agnostic approach

    Florent: if you only care about maxHeight (as typical e.g. for
    a presentation)...

    Elad: windows or tabs can be resized, so we should probably
    expect that API to be called more than once

    Henrik: the point of the API is to avoid reconfiguration as
    much as possible, not in all cases

    Florent: scaleResolutionDownBy would be a better fit for that
    situation

    Henrik: this is mostly about optimizing processing when
    dropping layers in simulcast

    jan-ivar: what happens when setting both?

    Henrik: we throw an exception

    RESOLUTION: proceed with a PR for [41]#159 with revised names

      [41] https://github.com/w3c/webrtc-extensions/issues/159

   [42]RTCRtpParameters.codec matching is probably too strict

      [42] https://github.com/w3c/webrtc-pc/issues/2987

    [43][Slide 39]

      [43] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=39

    Florent: there is a provision in the spec about unsetting a
    codec (pointed to the relevant step in the github issue)
    … hidden in the long "apply a description" algorithm
    … using the "codec dictionary match" algorithm (which may need
    to be improved)
    … maybe we need to focus it about the other side wants to
    receive, which as we've grown aware of has a lot of subtleties

    [44][Slide 40]

      [44] 
https://lists.w3.org/Archives/Public/www-archive/2024Aug/att-0003/WEBRTCWG-2024-08-27__1_.pdf#page=40

    Harald: the two codecs in the slide can't match, since one of
    them say it can only deal with 30 fps
    … codec matching is defined by SDP O/A, on a per-codec basis

    Jan-Ivar: but there are other examples of fmtp that would be
    compatible, right?

    Harald: yes, e.g. most h264 profiles would accept baseline
    … but main and high are different superset of baseline, so
    shouldn't match
    … illustrating again this is codec dependent

    Bernard: a non-match should only occur in situations where you
    need symetry (which most codecs don't require)

    HTA: that's about negotiation - what we're discussing is what
    we want to send

    Bernard: I thought the original issue was about negotiation; in
    this is particular example, this is about receiver
    capabilities, which aren't incompatible as a result, since no
    symetry is required

    HTA: we need a matching algorithm for negotiation, and a
    different one for setParameters

    Jan-Ivar: codec-dict-match shouldn't be confused with the
    negotiation algorithm
    … we should specify a selection algorithm
    … the spec allows to clear the codec parameter after
    negotiation - the UA might still use it as a hint (for then we
    should specify it for interop)

    Florent: the order in the SDP express a preference, but not a
    requirement

    Bernard: +1 - the sender can change to a different negotiated
    codec at any time (e.g. in case of a hardware codec failure)

    HTA: we could argue that if the codec specifies a codec
    description within the parameters of the negotiated parameters
    codecs, then it should use that one
    … if it's a superset, it needs clearing
    … I don't want our spec to be dealing with codec match across
    all codecs, but we could have a note of the acceptability
    … this is usually covered in offer/answer considerations in the
    relevant RFCs

    Florent: if the developer really want a codec, they can call
    setParameters again
    … we probably will learn from developers as adopt it of
    additional needs
    … Clearing the codec parameter already signals that it has been
    ignored, and stats expose what codec is in use

    HTA: let's continue these clarifications in the issue

Summary of resolutions

     1. [45]proceed with a PR for #159 with revised names


     Minutes manually created (not a transcript), formatted by
     [46]scribe.perl version 229 (Thu Jul 25 08:38:54 2024 UTC).

      [46] https://w3c.github.io/scribe2/scribedoc.html
Received on Wednesday, 28 August 2024 09:15:43 UTC