[minutes] September 20, 2021 meeting from Dominique Hazael-Massieux on 2021-09-20 (public-webrtc@w3.org from September 2021)

From: Dominique Hazael-Massieux <dom@w3.org>
Date: Mon, 20 Sep 2021 19:06:44 +0200
To: "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <e4f25196-0b25-884c-93b9-19e614f8ae88@w3.org>
Hi,

The minutes of our meeting held today (September 20, 2021) are available at:
  https://www.w3.org/2021/09/20-webrtc-minutes.html

and copied as text below.

Dom

                 WebRTC September 2021 virtual interim

20 September 2021

   [2]Agenda. [3]IRC log.

      [2]
https://www.w3.org/2011/04/webrtc/wiki/September_20_2021#WebRTC_WG_Virtual_Interim
      [3] https://www.w3.org/2021/09/20-webrtc-irc

Attendees

   Present
          ArneSchramm, BenWagner, BernardA, BrianBaldino, Carine,
          Dom, EladAlon, GuidoUrdaneta, Harald, Jan-Ivar,
          SergioMurillo, SongXu, ThomasGuilbert, TimPanton,
          TonyHerre, YouennFablet

   Regrets
          -

   Chair
          Bernard, Harald, Jan-Ivar

   Scribe
          dom

Contents

    1. [4]Next meetings
    2. [5]Status of recent CfCs
    3. [6]WHATWG Streams
    4. [7]Agenda review
    5. [8]Conditional Focus
    6. [9]getViewportMedia
    7. [10]Display surface constraint
    8. [11]Echo Cancellation
    9. [12]Wrapping up
   10. [13]October meeting
   11. [14]Summary of resolutions

Meeting minutes

   [15]Slides

     [15]
https://www.w3.org/2011/04/webrtc/wiki/images/8/86/WEBRTCWG-2021-09-20.pdf

  Next meetings

   Bernard: October VI to be scheduled 1st week of October -
   Doodle poll open till nex tweek
   … then TPAC meetings (joint & solos)

  Status of recent CfCs

   Bernard: Republishing media capture and streams as CR -
   completed positively on Sep 17
   … Jan-Ivar will summarize the chairs decision on it
   … Another CfC on Transferrable MediaStreamTracks running until
   Sep 27
   … our next meeting in October will build on this

  WHATWG Streams

   Bernard: we have potential dependencies to WHATWG streams
   … a number of discussions in their repo relate to issues we've
   discussed in terms of our media processing pipelines

  Agenda review

   Bernard: main topics: Conditional focus, getViewportMedia,
   Display surface contraints, echo cancellation

  Conditional Focus

   Elad: depending on use cases, switching the focus from the
   browser to the captured window makes more or less sense
   … focus control is an important part of the user experience,
   given that making a presentation can be stressful
   … e.g. if you're capturing a window where you're writing text,
   focus needs to be there
   … but there are situations where the browser can be used
   directly to control to the captured window
   … the challenge is that the browser cannot determine one
   situation from another
   … when the capturing application has a lot more situational
   awareness
   … not necessarily complete knowledge, but at least some
   … I'm proposing an API that associates stream capture with the
   ability to give a specific limited focus switch opportunity
   … to the capturing application
   … because this is done right after the capture is starting
   (although before a frame is being catpured), the capturing
   application has all the context it can get to make its decision
   … the idea is to gives that focus-switching opportunity in a
   microtask in a promise resolution of the capture request
   … the proposal includes a number of mitigations (e.g. a 1s
   timeout) to avoid risks of focus-switching attacks
   … the particular API I'm proposing is exposed via a method on a
   subcall of MediaStreamTrack - that way it's only available when
   obtained through a captured tab or window
   … we could look at a more finegrained inheritance tree if there
   is interest

   Jan-Ivar: this is a reasonable problem to solve; I have some
   concerns with the API surface
   … since focus switching is global to the user, it doesn't need
   to be on a mediastreamtrack subclass
   … it could live e.g. on navigator.mediaDevices
   … I think a microtask is too narrow - we should queue a task
   instead, this would give the same presentation
   … Without having received a frame, how can app determine
   whether to switch or not?

   Elad: getSettings() on the captured stream can tell you the
   kind of display surface
   … checking the content of a frame is likely challenging to get
   right in any case
   … looking just at the metadata is easier
   … re global vs mediastreamtrack, it was partly to protect
   against attacks based on cloning - but happy to look more into
   alternatives
   … task vs microtask - can you say more about your concerns
   about shim-ability?

   Jan-Ivar: it's a general principle, and I'm not sure the
   advantages of a microtask in the first place

   Elad: part of it was a concern of backwards compatibility and
   performance

   Jan-Ivar: I think track & microtask can both address these
   aspects
   … in any case, my main concern is where the API lives at the
   moment

   Youenn: cloning of tracks is known; when you subtype tracks, it
   starts to be messy
   … what type would be assigned to a cloned track?
   … we should avoid subtypes if possible
   … mitigations of 1s and against busy-looping sound good
   … I need to think more about the 1s delay

   Harald: re cloning and MST subtracks - we have one case like
   that, and I think we should change it
   … we have 2 options: subclassing or making the method returns
   an error
   … I don't think JS dev care one way or another
   … subclassing feels a bit tidier

   Elad: the goal was to reflect our design in the class hierarchy
   indeed

   Youenn: to get there, I think we should first list the use
   cases where subtypes actually help - just one method feels not
   enough to consider changing clone()

   Elad: 3 methods would fit: captureHandler, @@@ only apply to
   captured media

   Jan-Ivar: I'm opposed to subclassing - I think that API should
   live in a global space e.g. navigator.mediaDevices.focus

   Harald: where will that written up? I would like to see the
   argument in more details

   Elad: I'm hearing interest in the API

   Jan-Ivar: interested in solving the problem with a slightly
   different shape

   Youenn: +1 on a different shape, and discussion on the 1s
   delay; but sounds like a good space to work on

   [clarification on the 1s requirement makes Youenn happy]

  getViewportMedia

   [16]getViewportMedia(): Let pages opt-in to capture #155

     [16] https://github.com/w3c/mediacapture-screen-share/issues/155

   Elad: getViewportMedia is an API allowing to capture the
   current viewport (what is visible in the tab launching the API
   call)
   … equivalent of calling getDisplayMedia and selecting the
   current tab
   … there is danger associated with self-capture
   … to protect against this, we're requiring
   crossOriginIsolation, opt-in via a header (most likely document
   policy, but to-be-confirmed)
   … and only available to top-level docs or privileged iframes
   … Jan-Ivar and I have been discussing a lot and have converged
   on a number of proposals as summarized in the slide

   Jan-Ivar: we're proposing that getViewportMedia would capture
   the entire viewport when called from an iframe
   … and we're proposing using Document Policy with names built on
   "viewport-capture"
   … the first proposal is basically deferring the approach to
   cropping to later

   Resolution: getViewportMedia capture the full viewport when
   called from an iframe

   Harald: re "viewport-capture", is it aligned with the naming
   convention of Document Policy?

   Tim: just noting the two decisions (iframe capturing the full
   viewport, and naming) are linked

   Resolution: use viewport-capture as naming basis for Document
   Policy of getViewportMedia

   Harald: these will be confirmed on the mailing list

   Elad: I also intend to suggest a cropping API that might
   complement getViewMedia in the upcoming months

   Jan-Ivar: getViewportMedia should require user activation

   Dom: +1

   Elad: I can imagine certain cases where use activation makes
   sense, but others where less so
   … e.g. if you open a new tab

   Youenn: this feels like a general problem for user activation
   that is worth discussing in general
   … but given that this is privileged API, user activation feels
   like a must

   Dom: +1 on solving it generically for user activation unless we
   can demonstrate something specific to capturing

   Youenn: note that changing user activation rules is really
   hard, so we need to get our answer right before shipping

   jan-ivar: removing user activation shouldn't as hard as adding
   it afterwards

   Elad: I would want more time to make a decision on that
   particular bit

  Display surface constraint

   [17]Revisit: Let getDisplayMedia() influence the default type
   choice in the picker #184

     [17] https://github.com/w3c/mediacapture-screen-share/issues/184

   Elad: getDisplayMedia doesn't let influence user's choice
   … user's choice is already being influenced though, by virtue
   of having a 1st item in the list of choices
   … Chrome has Screen-first
   … Safari has only choice (so a major influence)
   … FF is evolving
   … Influence could be wielded positively - towards the safer
   choice, or the more relevant one
   … a lot of Web developers have expressed interest in allowing
   influence or limit user's choice:
   … - save clicks (if the app knows they only want tab, or only
   want windows)
   … - apps want to capture audio - only available on a subset of
   capture sources
   … - tabs provide higher FPS
   … - the app knows from context - e.g. allowing to favor slides
   over other content when doing a presentation
   … - avoid risk with over sharing
   … The proposal I'm making is to add a hint as part of the
   contraints, e.g. "ideal: browser"
   … the user agent may choose how to apply that hint - from using
   it to prioritize, to ignoring it or adding warnings in case the
   UA determines it's not safe to apply the hint
   … [showing the specific text proposal in #184]
   … all other contraints are still processed after the user made
   their choice, only that one gets processed before
   … it's only a hint, it cannot limit user's choice
   … e.g. Chrome would show the list of tabs in preference when
   "browser" is hinted

   Jan-Ivar: in the github discussion, we mentioned additional
   mitigations - e.g. not listing the requesting tab/window in the
   list of tabs
   … would like to see some of these ideas reflected in the text
   … min & exact constraints are disallowed in gDM, so it would
   have to be "ideal"
   … I think it makes sense to use a hint to steer these selectors
   UI
   … for clarification, "influence/limiting" requirements
   discussed earlier were about the app, not the user agent

   Harald: re removing the calling tab, would it be only for this
   usage of the hint, or any use of gDM?

   Jan-Ivar: I think they need to be considered before we add this

   Elad: my recollection was we would encourage the UA to warn of
   risks of self-capture rather than removing the option
   altogether
   … there are other ways of adding friction that doesn't require
   removing the option completely
   … removing it completely might create risks of oversharing via
   sharing of the entire screen

   Jan-Ivar: I think we can probably converge on mitigations for
   self-capture
   … ideally, I would like normative language

   Youenn: should we allow a hint for capturing the entire screen?
   that's the riskiest
   … let's focus on hinting towards capturing less
   … In general, I dislike constraints - can we add a dedicated
   parameter instead of reusing the contraints syntax?
   … this may open further extensibility down the line (e.g.
   highlight tabs from a given origin?)
   … can you share more about Chrome's plans in terms of
   mitigations against self-capture and its dangers?

   Elad: we haven't prototyped the warning mechanism yet
   … re constraints, I have no objection to using a parameter
   instead of constraints
   … re removing "screen" - it's interesting, but if that is the
   default when no hint is given, this isn't really helping

   Youenn: that default behavior is specific to Chrome
   … Safari only allows screen, but we will have a picker at some
   point where screen won't be the default
   … and I don't think apps should have a way to default to screen

   Jan-Ivar: FF already doesn't default to screen, and +1 to
   youenn of not allowing (or just ignoring) screen as a
   constraint

   Elad: the user agent would already be free to ignore the hint
   … for Chromium, getting visibility on dev's intent would be
   useful in migrating away from that default

   Bernard: in terms of the requests from developers, is audio
   capture only avaiable on screen?

   Elad: no, it's available on tab, and screen on windows

   Bernard: re high-FPS capture - is that typically tab?

   Elad: in Chromium, yes
   … but it's in general, a way for developers to steer toward
   what they know will work for their use cases

   Bernard: is "screen"-level capturing key to any of these
   requests?

   Elad: right; but note that "screen" could be used to capture
   from a different monitor

   Jan-Ivar: but all monitors are dangerous

   Elad: so I'm hearing support except for the the screen-hint

   TimP: I dislike heuristics-based picker - it makes it a
   nightmare to test and makes everything unpredictable

   Elad: the mention for heuristics was for apps to use, not the
   UA

   Jan-Ivar: supporting, but with stronger language on warnings
   for self-capture

  Echo Cancellation

   [18]Echo cancellation: Need to specify the source of the echo
   cancellation reference signal #31

     [18] https://github.com/w3c/mediacapture-extensions/issues/31

   [19]Specify constraint echoCancellationReferenceSinkId #32

     [19] https://github.com/w3c/mediacapture-extensions/pull/32

   Harald: this is a request coming from our audio team
   … echo cancellation is about removing the audio picked up by
   the microphone in the room to keep only the audio generated
   *in* the room
   … it's in general complicated - a complicated part is knowing
   what to remove
   … current implementation in Chrome just looks at what's coming
   it via the peerconnection
   … this has proven insufficient and we want to revise this
   … if we want to remove audio output, you can hit issues with
   specific headphones or setups
   … from the application perspective, you want to identify what
   output has been used that is most relevant to echo cancellation
   and feed that to the algorithm
   … to keep it simple, we have an enumaration of output devices
   via sinkIds
   … the proposal is to re-use this sinkid in the contraint for
   echo cancellation

   TimP: +1 to do something in this space
   … will it help if you mix WebAudio in?
   … i.e. when the audio output comes from WebAudio processing

   Harald: yes, it should cover this (as long as the output makes
   it to the speaker)

   Jan-Ivar: Mozilla doesn't believe this API is needed to do
   correct echo cancellation
   … why does the UA needs JS input on this? The UA already know
   which headset is being used
   … it's not clear what getting input from the app is useful here

   Harald: which audio output is currently used by the echo
   cancellation?

   Jan-Ivar: I believe we have access to the rendered output (incl
   out of WebAudio)
   … Paul Adenot is our key person on this

   Harald: would like his opinion on the headcase

   Youenn: +1 to Jan-ivar - the UA should already have access to
   the all info it needs
   … and it has more info that apps would have on this

   bernard: Harald, you said chrome currently uses sum of all
   audio outputs from peerconnection
   … is the intent here to improve the chromium implementation or
   to let them do better echo cancellation?

   harald: this is not for app-based echo cancellation

   bernard: I've heard requests from apps to do have an adjustable
   echo cancellation - e.g. an echo cancellation transform stream

   Harald: that is orthogonal to this proposal
   … echo cancellation can't be modeled as a transform stream:
   it's a 2 input objects
   … it can be modeled as process that takes 2 audio inputs

   youenn: you could still do 1 input / 1 output with an
   additional parameter
   … in the transform stream creation with the reference stream

   Harald: interesting thing to do, but not this proposal

   TimP: there are situations where you don't want to cancel part
   of the stream being output - e.g. background music
   … with the room accoustics
   … maybe a rare use case, but one we've stumbled upon it for
   immersiveness

   harald: you could turn echo cancellation off?

   timP: but that generates other issues

   Sergio: I don't think this proposal would help solve the Chrome
   issue
   … there are 3 different issues being discussed: echo
   cancellation in Chrome, new echo cancellation tuning use cases
   (that would need clarification/refinement), and exposing echo
   cancellation separately from WebRTC (maybe in Web Audio)

   Harald: I'm hearing opposition to making an API of the specific
   proposal because the UA should be able to figure it out
   … I find it interesting that only browser output should be
   cancelled - if you have another app than the browser producing
   audio, shouldn't it be removed too?

   Jan-Ivar: RNNoise has been exploring some of this; but
   echoCancellation: true is likely focused on the meeting use
   case

   Youenn: the OS can also provide user-configurable echo
   cancellation styles

   Guido: the motivation for Chrome is to help figure which of the
   output devices should be used as the reference signal for echo
   cancellation
   … if there are several audio output devices with one being
   preferred by the app

   Harald: I'd like to invite comments on the issue on whether
   this API is needed or not
   … I haven't seen much comments on the shape of the API
   … if we were to conclude there was such a need, this API may be
   OK
   … but no consensus on the need for such an API

  Wrapping up

   Bernard: any CfC needed based on our discussions?

   Jan-Ivar: re getViewportMedia, should we put this in a new doc
   or an existing one?

   Dom: having a single document couple their process progress

   elad: also keeping them separate helps making clear how
   distinct they are

   youenn: it also helps in terms of separating the test cases in
   different folders

   harald: sounds like convergence towards a separate spec

   jan-ivar: would still prefer a single doc

  October meeting

   Bernard: next meeting will be devoted to mediacapture-transform
   - proposed content and agenda was shared on the list

   [20]Preview of October Virtual Interim slide deck

     [20]
https://lists.w3.org/Archives/Public/public-webrtc/2021Sep/0030.html

   Bernard: there is overlap between mediacapture-transform and
   WHATWG streams issues

   Youenn: I will try to mark more explicitly issues in MC-T that
   are linked to WHATWG streams

   Bernard: part of what I thought might be useful to hear is
   where these upstream WHATWG stream issues are on the roadmap
   (if at all)

   Jan-Ivar: the new proposal we want to present is streams-based,
   but improvements over the existing one
   … still needs some fixes in WHATWG streams
   … I have linked demos in the slides for some of the issues
   we're trying to address

   TimP: it would be good to start these presentations with use
   cases to scope our discussions

   Jan-Ivar: the slides Youenn and I developed includes goals of
   the proposals

   Harald: Media Capture Transform starts with use cases

   Bernard: Streams have been adopted to use streams to manage
   pipelines

   Youenn: please send early feedback on the proposals

Summary of resolutions

    1. [21]getViewportMedia capture the full viewport when called
       from an iframe
    2. [22]use viewport-capture as naming basis for Document
       Policy of getViewportMedia


    Minutes manually created (not a transcript), formatted by
    [23]scribe.perl version 136 (Thu May 27 13:50:24 2021 UTC).

     [23] https://w3c.github.io/scribe2/scribedoc.html
Received on Monday, 20 September 2021 17:06:49 UTC