[minutes] March 15 meeting

From: Dominique Hazael-Massieux <dom@w3.org>
Date: Fri, 18 Mar 2022 07:38:51 +0100
Message-ID: <d957f6b6-0f31-6cd6-0e2f-5f3a43d060f7@w3.org>
To: "public-webrtc@w3.org" <public-webrtc@w3.org>

The minutes of our meeting last Tuesday (March 15) are available at:
including the YouTube recording at https://youtu.be/GM56xH-jF8Q

They're also copied as text below.


                        WebRTC WG March 2022 call

15 March 2022

    [2]Agenda. [3]IRC log.

       [2] https://www.w3.org/2011/04/webrtc/wiki/March_15_2022
       [3] https://www.w3.org/2022/03/15-webrtc-irc


           BenWagner, Bernard, Dom, Eero, Elad, Guido, Harald,
           Jan-Ivar, JohannesKron, Riju, Tuukka, Varun, Youenn


           Bernard, Harald, Jan-Ivar



     1. [4]TPAC 2022
     2. [5]WebRTC-SVC
     3. [6]WebRTC-Extensions
     4. [7]Avoiding the “Hall of Mirrors”
     5. [8]Display Surface Hints
     6. [9]getViewportMedia update
     7. [10]MediaCapture Extensions proposals
     8. [11]Summary of resolutions

Meeting minutes

    Recording: [12]https://youtu.be/GM56xH-jF8Q

      [12] https://youtu.be/GM56xH-jF8Q



    Slideset: [14]https://lists.w3.org/Archives/Public/www-archive/


    [15][Slide 1]


    [16][Slide 3]


   TPAC 2022 [17]🎞︎

      [17] https://www.youtube.com/watch?v=GM56xH-jF8Q#t=164

    [18][Slide 8]


    Dom: TPAC being considered as a hybrid event this year - please
    indicate whether you think you might join physically such an

    [from online poll: 3 Yes, 4 No, 4 don't know]

   [19]WebRTC-SVC [20]🎞︎

      [19] https://github.com/w3c/webrtc-svc/
      [20] https://www.youtube.com/watch?v=GM56xH-jF8Q#t=362

    [21][Slide 11]


    Bernard: [22]issue #68 relates to behavior of getParameters() -
    unclear about re-negotiation (vs before/after negotiation)
    … [23]PR #69 has proposed text that clarifies that we're
    talking about **initial** negotiation (before/after)
    … if you re-negotiate, you'll still get the currently
    configured scalability mode

      [22] https://github.com/w3c/webrtc-svc/issues/68
      [23] https://github.com/w3c/webrtc-svc/pull/69

    Harald: wfm

    Jan-Ivar: is this correct? getParameters() algos are very
    explicit about what you get based e.g. on localDescription
    … some come from pending, others from current

    Bernard: let's say you change preference order for codecs, and
    you renegotiate (e.g. from VP8 with L1T2 to H264 that doesn't
    support scalability) - what happens then?
    … at what point do things change?

    JIB: even without setCodecPreferences, getParameters() may
    return different values depending on whether re-negotiation is
    happening or not
    … e.g. if you have a local offer, it might affect the results

    Bernard: looking at the VP8→H264 case, what should happen?

    HTA: as long as you're sending VP8, you should get L1T2 back
    … when you switch to H264, you get L1T1 back

    Bernard: that's what I would expect and what the text tries to
    … nothing changes until the new codec starts being used
    … JIB, could you write up your concern in [24]#68 ?

      [24] https://github.com/w3c/webrtc-svc/issues/68

    RESOLUTION: Continue discussion in [25]issue #68

      [25] https://github.com/w3c/webrtc-svc/issues/68

   [26]WebRTC-Extensions [27]🎞︎

      [26] https://github.com/w3c/webrtc-extensions/
      [27] https://www.youtube.com/watch?v=GM56xH-jF8Q#t=791

    [28][Slide 16]


    Bernard: Fippo gathered a list of hardware acceleration bugs
    that has been encountered
    … which raises the question of allowing to disable hardware
    … WebCodecs provides an enum to hint about whether or not use
    hardware acceleration

    [29][Slide 17]


    Bernard: I looked into 2 approaches: setParameters,
    … the first one doesn't really work since the envelope of
    changes may not include hardware alternatives
    … it also only makes sense if mid-stream switch is necessary
    … the second approach goes through re-negotiation via
    … How would you discover this?
    … Media capabilities may need amendment [30]https://github.com/

      [30] https://github.com/w3c/media-capabilities/issues/185

    Dom: should this be managed by the browser rather than left for
    developers to detect and manage?

    Bernard: this would be useful *when* developers detect a
    problem so that they don't need to wait for browsers to react
    to it

    Florent: there are also cases where a decoder interacts badly
    with a specific encoder

    JIB: for setParameters, there are read-only properties
    … putting it in codeccapability (which is returned to
    developers) means doubling the number of entries

    Bernard: you may not have to return it from Capabilitiy

    JIB: but then it doesn't fit very well with a notion of codec
    … we've also moved fingerprinting surface to media capabilities
    … I wouldn't want to reintroduce concerns without good reasons
    … it doesn't seem necessary to include that info if it is
    tackled as a preference

    Johannes: I understand this as developer wanting to disable
    hardware encoding as a short-term patch to the browser getting
    it fixed
    … it sounds like a recovery mode, more than a capability
    … also agree it's hard for developers to use it, but that it
    would have its uses

    Harald: routing around bugs is for specific implementations of
    the codec, which requires they know the specific implementation
    … does that point toward media capability as the right way to

    Bernard: that's where you'd find out if it's "smooth", "power
    efficient", "supported"

    Harald: if it's X's hardware encoder with software version Y,
    that may be the information you need to know whether or not to
    use it
    … not sure that fits with the Media Capabilities model

    Johannes: it would seem challenging
    … Also, the bugs that have been identified seem to be
    … there are block-lists for this or that hardware; it may be
    worth investigate the possibility to move towards dynamic
    blocklists from browsers

    Riju: we share the GPU blocklist defined in Chrome with our
    driver team to get them to be fixed platform by platfomr

    Harald: no clear resolution, but some suggested paths worth

    [31][Slide 18]


    Harald: [32]issue #99 about RTP header extension
    … if an implementation supports an extension, it doesn't show
    up in Capabilities at the moment
    … is this problematic? if not, no change needed; if it is, we
    may need to surface that it exists but is disabled by default
    … you can get the information by inspecting the offer, so this
    may not be needed

      [32] https://github.com/w3c/webrtc-extensions/issues/99

    Bernard: it's a convenience in the use case; there will be
    scenarios where you don't want to set it on by default

    Dom: is anyone asking for it?

    JIB: if this is for debugging, looking at the SDP is fine; if
    it's to control running code, it should be an API

    Harald: the most likely example would be if transport-cc is not
    supported, I fallback to another congestion control
    … I think it can be shimmed by creating an offer and dancing
    with a throw-away peer connection

    Dom: not hearing a lot pushback, nor a lot of demand either;
    maybe wait until we have more demand if it can be designed in a
    way that is backwards compatible

    Harald: yes, it can be done later in a backwards compatible

    RESOLUTION: close [33]#99 with no change

      [33] https://github.com/w3c/webrtc-extensions/issues/99

   [34]Avoiding the “Hall of Mirrors” [35]🎞︎

      [34] https://github.com/w3c/mediacapture-screen-share/issues/209
      [35] https://www.youtube.com/watch?v=GM56xH-jF8Q#t=1970

    [36][Slide 21]


    [37][Slide 22]


    [38][Slide 23]


    [39][Slide 24]


    Elad: the proposal would to add a new member to the
    DisplayMediaStreamContraints à la includeCurrentTab to hint to
    the UA whether or not to include the current tab or not

    [40][Slide 25]


    Elad: influencing the user decision in picking display surfaces
    has security implications
    … but I argue that in this case, it is not problematic: the
    risks of selection are of two nature:
    … - the attacker influence the user to share a surface under
    the attacker's control
    … - the attacker influences the user to share a tab with
    sensitive content (e.g. their bank account)
    … but excluding-self is orthogonal to these

    [41][Slide 26]


    Elad: if we agree this is worth solving; the question becomes
    what's the default value should be
    … if we make it optional, this could be left as a UA dependent

    [42][Slide 27]


    Elad: a potential expansion would cover additional surfaces
    (e.g. screen)

    JIB: [43]#209 has the detailed discussion - what is the
    proposal we're reviewing?

      [43] https://github.com/w3c/mediacapture-screen-share/issues/209

    Elad: I suggest adding a dictionary member (either include or
    exclude) that serves as a hint, with no change to current

    JIB: I like this API, but would want the default to be "false"
    … I don't think this is so much about hall of mirrors - a
    symptom that the UA could address either ways
    … the real issue is that in many cases, self-capture is NOT the
    … long term, self-capture would be getViewportMedia
    … some sites that want self-capture to be part of the selection
    - they would need to opt-in
    … also, TAG guidance is that undefined maps to false

    Elad: re default true - agree
    … re alternative approaches Youenn suggest, I don't think ti
    works for current tab (it would work for current screen)
    … I agree with your characterization that the root cause is if
    you're not ready to self capture
    … I suggest we don't take getViewportMedia into account since
    there is little visibility in terms of its adoption
    … I think we should avoid breaking apps, even if shortly

    JIB: I think we should keep that separate from what
    implementations do
    … here the question is what's the most frequent case, most
    sites wouldn't want to it

    Elad: lost of self-capture happning every year; assume a lot of
    it not accidental

    Youenn: re security, the current spec doesn't deal much with
    tab capture in that regard
    … we're bringing more and more control to what UAs will show,
    and that means we need to strengthen the guidance to UAs
    … Chrome has some mitigations in this space that might serve as
    a starting point
    … If this is a hint, this is fine
    … Some implementations might remove entirely the possibility to
    select the tab, that's something new
    … hints allow to push users towards the more meaningful choice,
    but leave the user in charge of the final choice
    … re hall of mirrors - I don't think this is solving it
    … some native apps have implemented current-app blurring to
    solving the issue
    … cropping would be another way to solve the issue
    … if it's only a hint, it's fine; but if it brings a required
    behavior, I don't think we should go there
    … also want more security guidance
    … and keep issue open on addressing other aspects of hall of

    Elad: could you help with the security guidance?

    Youenn: Ideally would like to get the work that Chrome has done

    Dom: +1 on a hint; if boolean is problematic, we can use an
    enum to avoid the default value fallback

    Elad: happy to help with getting the security considerations
    with guidance from Youenn on what he wants to see

    Harald: hearing overall support to continue in that direction,
    towards a hint

   [44]Display Surface Hints [45]🎞︎

      [44] https://github.com/w3c/mediacapture-screen-share/issues/184
      [45] https://www.youtube.com/watch?v=GM56xH-jF8Q#t=3236

    [46][Slide 30]


    Elad: similar to previous issue, but distinct
    … some apps want to hint to the UA that it is will geared
    toward a particular display surface type
    … I think there is agreement that this is worth supporting
    … but we've struggled to find an approach that everyone likes
    … I'm suggesting a compromise based on the discussion which
    would be:
    … - use constraints as a mechanism
    … - make it a hint with UA dependent behavior

    Youenn: hint is fine; it could be a constraint as a model, but
    with an improved simpler WebIDL surface

    Elad: reject on "exact"?

    Youenn: "exact" would be ignored

    Harald: -1 in integrating this in the proposal - I hate

    JIB: +1 to Harald; "exact" is already a type error in
    getDisplayMedia which already narrows down the constraint
    … agree with reusing displaySurface
    … I have concerns with an app asking for a monitor - I don't
    think we should provide this level of control
    … I proposed text to steer away users from monitor capture

    Elad: this is a hint - UAs can decide not to follow it

    Dom: with a hint, UAs can provide the best experience they can
    … not sure the SHOULD would achieve much if the main target
    isn't interested in SHOULD

    Youenn: the SHOULd owuld be useful for new implementors

    Elad: there is merit to that
    … non-normative language pointing to the risk would be good

    JIB: the SHOULD already allows for this; given Chrome has a
    good motivation, this feels like an exact reason why SHOULD
    would be used

    RESOLUTION: modulo discussion on SHOULD guidance, we adopt the
    displaySurface constraint proposal to manage Surface Hints

   [47]getViewportMedia update [48]🎞︎

      [47] https://github.com/w3c/mediacapture-viewport
      [48] https://www.youtube.com/watch?v=GM56xH-jF8Q#t=3906

    [49][Slide 31]


    JIB: FYI, there is a PR up to describe getViewportMedia which
    hopes to bring to a call for adoption soon

    [50]Viewport Capture Unofficial Draft

      [50] https://w3c.github.io/mediacapture-viewport/

    Youenn: we probably need a different set of constraints than
    the ones for getDisplayMedia
    … re audio, we need to think about whether to include system
    level audio or just current tab

    JIB: currently restricted to current tab

    Harald: if it can't be isolated, no audio should be captured

    JIB: there are pending PRs that I hope will be merged before we
    start the call for adoption

    Elad: the general intent of this work is awesome; looking
    forward to see it implemented
    … that said, until we see it adopted, we need to be careful in
    basing our decisions on this work, or consider relaxing some of
    the restrictions

    Youenn: has there been any outreach to web developers re
    x-origin isolation?

    Elad: the feedback I got from developers was this was a blocker
    for them

    Bernard: ditto

    JIB: I agree this is taking the long view here
    … hence the flexibility we're showing on getDisplayMedia
    … re using different constraints, we can change it when it
    shows as needed

    Youenn: displaySurface would be one case where this is needed

   [51]MediaCapture Extensions proposals [52]🎞︎

      [51] https://github.com/w3c/mediacapture-extensions/
      [52] https://www.youtube.com/watch?v=GM56xH-jF8Q#t=4238

    [53][Slide 34]


    Riju: this is follow up from a conversation that started at

    [54][Slide 35]


    Riju: [55]PR #48 is allowing in-browser face detection
    … when we showed this last time, the feedback included:
    … - tie it to VideoFrame rather than MediaStreamTrack, which
    the PR reflects
    … - future-proofing the bounding box approach - this is
    addressed with the Contour described in the PR, with a way for
    the developer to request something other than the default 4
    … - another request was to have a face mesh - which is now
    exposed as an additional property (although there is no native
    support for it today)
    … - face expression was raised as a concern, so we removed it
    … - making face detection work with transform stream

      [55] https://github.com/w3c/mediacapture-extensions/pull/48

    [56][Slide 36]


    Riju: we've put up an example to show how they would work
    … we've done early testing that shows improved power
    consumption - more specific numbers to be shared soon

    Youenn: good to expose it on VideoFrame; but would also be good
    to expose in requestVideoFrame callback e.g. for use with
    … re using "exact" constraints - I would expect "exact" not to
    be allowed in this
    … There seems to be switches to give hints to cameras - do we
    need several switches to allow per-algo enabling, or could we
    have a single "face detection" switch?

    Riju: e.g. "is face detection supported"?

    Youenn: why multiple switches if a single one is good enough,
    leaving it to the Web app to deal with what they're obtaining

    Riju: for instance, contour points would allow future support
    for additional more detailed contours

    Youenn: since the camera is doing the work, not clear we need
    to give more hints to the driver

    Riju: contour/mesh were added for extensibility

    Youenn: maybe reduce to what's implementable, while
    future-proofing it

    Bernard: high level questions about the API surface
    … I understand the supported contraints & capabilities are used
    to provide the basic parameters for the algorithm in the driver
    … videoFrame.detectedFaces is already done by the driver
    … as opposed to have a promise-based method to which the
    parameters would be given
    … if your camera driver doesn't support it, you wouldn't have

    Riju: going through promises, this would impact performance and
    re do work the driver has already done
    … OS level face analysis would duplicate computation already
    done in the driver

    JIB: so, it's a camera API - only available to sources that are

    Riju: right

    JIB: my concern is that there is another effort in the WICG,
    the shape detection API - how does it relate to it?
    … would be unfortunate to have it to deal with face detection
    differently depending on the source

    Riju: shape detection work on images, can be called multiple
    … no face tracking available, which helps detecting face across
    frames efficiently
    … face detection is based on OS level face analysis, which
    duplicates the driver work and is less power efficient / robust
    … we started from that API in our effort in this space - we
    feel this new approach gives much better results
    … FaceDetector is only supported in Windows atm; the work has
    stopped afaict

    Bernard: so you're saying the WICG work is not going ahead?

    Riju: I can check the status with Reilly (but my team was the
    one behind the implementation)

    Harald: I share some of JIB's worries
    … we have functions today that depend on high quality face
    detection e.g. background blur
    … I'm worried about having these different interfaces to solve
    the same problem
    … esp if some interfaces end up proprietary
    … if the proprietary interfaces provide much higher quality
    than what standard interfaces can provide
    … hence my pushback on making contours and meshes available in
    the API
    … I'm still not happy with the design that seems to be totally
    focused on axing this on hardware/driver resources rather than
    a representation API
    … it has a bit of that flavor, but there is still a lot of a
    sense of configuring the camera
    … also I'm surprised this only gives a 50% factor over media
    … but in general, this feels like a major new way of treating
    media information
    … I'd like to see be proposed as a proposal, not as a set of
    API patches
    … with an explainer, use cases, examples - that we typically
    put together before agree on taking it up

    Riju: no need to configure the driver
    … the PR includes examples

    Harald: I'm thinking of what application would be use this for,
    what problems to solve

    Dom: what an explainer would cover

    Riju: I can come up with that

    Dom: happy to help with the logistics of making it happen

    Riju: is the question about whether this is useful or not?

    harald: yes

    bernard: or rather whether it handles all the use cases people

    Jan-Ivar: e.g. tying this with camera may become obsolete or
    too limiting
    … having an API that isn't as strongly tied to hardware

    Harald: I'd like to have a better understanding of which apps
    want a rectangle around a face

    Youenn: encoders actually optimize around faces if such
    metadata are available
    … +1 on defining API that can obtain metadata from the hardware
    or a TransformStream

    JIB: among other things, having less hardware-dependency allows
    UAs to step in

    [57][Slide 37]


    Riju: backgroundBlur has more platform API support than

    Youenn: iOS has the ability to switch on & off background blur,
    fully outside of the Web app, and fully dynamic
    … the Web app could not unblur if the user has set this us at
    the OS level
    … (but not vice versa)
    … that situation is not well supported by constraints
    … we may need a way to surface whether a constraint *can* be
    changed (and to signal when it can no longer be changed)

    JIB: this is a case where constraints work very well - the app
    states its ideal
    … background blur is popular, would be good to support it

    Youenn: I don't think "ideal" suffices to expose the situation
    … re backgroundBlur level - it's not settable on iOS; are there
    platforms that would benefit from it?

    Riju: no platform API supports this, but some software models
    have that parameters
    … but I understand some platforms are working towards making it

    Youenn: but without knowing the algorithm, setting a particular
    value would be hard for developers
    … we may need a boolean instead

    JIB: part of the question is whether this needs to be
    controllable by apps vs the UA

    harald: in audio, we've encountered cases that it's valuable to
    tell have manipulating settings that are supposed to be useful
    in the driver, but actually creates issues
    … e.g. double echo cancellation control
    … the most important control we have is to turn platform
    effects off; the second was to detect the situation to ask the
    user to turn it off

    Riju: on the last three proposals (lighting correct, face
    framing, eye gaze correction), any sense of interest?
    … the goal is to give options to developers on whether or not
    to use hardware capabilities

    Bernard: should we get back to this in April?

    JIB: from Mozilla's perspective, we don't have strong interest
    in this approach given possible interop cross-OS issues
    … we don't see any urgency

    Harald: for face detection, we have a pretty solid way forward
    via the explainer with use cases and justifications to support
    … some of these additional camera controls may fit into that
    new document
    … if we accept constraints as a way to control camera drivers,
    grouping them together make sense

    JIB: but adding individual constraints is something we've used
    mediacapture-extensions in the past

    Youenn: the complexity of a boolean constraint is very
    different from the more complex Face API detection

    Dom: I'll work with the chairs to agree on a clearer path
    forward then :)

Summary of resolutions

     1. [58]Continue discussion in [59]issue #68
     2. [60]close [61]#99 with no change
     3. [62]modulo discussion on SHOULD guidance, we adopt the
        displaySurface constraint proposal to manage Surface Hints

      [59] https://github.com/w3c/mediacapture-extensions/issues/68
      [61] https://github.com/w3c/mediacapture-extensions/issues/99
