- From: Dominique Hazael-Massieux <dom@w3.org>
- Date: Mon, 20 Sep 2021 19:06:44 +0200
- To: "public-webrtc@w3.org" <public-webrtc@w3.org>
Hi,
The minutes of our meeting held today (September 20, 2021) are available at:
  https://www.w3.org/2021/09/20-webrtc-minutes.html
and copied as text below.
Dom
                 WebRTC September 2021 virtual interim
20 September 2021
   [2]Agenda. [3]IRC log.
      [2]
https://www.w3.org/2011/04/webrtc/wiki/September_20_2021#WebRTC_WG_Virtual_Interim
      [3] https://www.w3.org/2021/09/20-webrtc-irc
Attendees
   Present
          ArneSchramm, BenWagner, BernardA, BrianBaldino, Carine,
          Dom, EladAlon, GuidoUrdaneta, Harald, Jan-Ivar,
          SergioMurillo, SongXu, ThomasGuilbert, TimPanton,
          TonyHerre, YouennFablet
   Regrets
          -
   Chair
          Bernard, Harald, Jan-Ivar
   Scribe
          dom
Contents
    1. [4]Next meetings
    2. [5]Status of recent CfCs
    3. [6]WHATWG Streams
    4. [7]Agenda review
    5. [8]Conditional Focus
    6. [9]getViewportMedia
    7. [10]Display surface constraint
    8. [11]Echo Cancellation
    9. [12]Wrapping up
   10. [13]October meeting
   11. [14]Summary of resolutions
Meeting minutes
   [15]Slides
     [15]
https://www.w3.org/2011/04/webrtc/wiki/images/8/86/WEBRTCWG-2021-09-20.pdf
  Next meetings
   Bernard: October VI to be scheduled 1st week of October -
   Doodle poll open till nex tweek
   … then TPAC meetings (joint & solos)
  Status of recent CfCs
   Bernard: Republishing media capture and streams as CR -
   completed positively on Sep 17
   … Jan-Ivar will summarize the chairs decision on it
   … Another CfC on Transferrable MediaStreamTracks running until
   Sep 27
   … our next meeting in October will build on this
  WHATWG Streams
   Bernard: we have potential dependencies to WHATWG streams
   … a number of discussions in their repo relate to issues we've
   discussed in terms of our media processing pipelines
  Agenda review
   Bernard: main topics: Conditional focus, getViewportMedia,
   Display surface contraints, echo cancellation
  Conditional Focus
   Elad: depending on use cases, switching the focus from the
   browser to the captured window makes more or less sense
   … focus control is an important part of the user experience,
   given that making a presentation can be stressful
   … e.g. if you're capturing a window where you're writing text,
   focus needs to be there
   … but there are situations where the browser can be used
   directly to control to the captured window
   … the challenge is that the browser cannot determine one
   situation from another
   … when the capturing application has a lot more situational
   awareness
   … not necessarily complete knowledge, but at least some
   … I'm proposing an API that associates stream capture with the
   ability to give a specific limited focus switch opportunity
   … to the capturing application
   … because this is done right after the capture is starting
   (although before a frame is being catpured), the capturing
   application has all the context it can get to make its decision
   … the idea is to gives that focus-switching opportunity in a
   microtask in a promise resolution of the capture request
   … the proposal includes a number of mitigations (e.g. a 1s
   timeout) to avoid risks of focus-switching attacks
   … the particular API I'm proposing is exposed via a method on a
   subcall of MediaStreamTrack - that way it's only available when
   obtained through a captured tab or window
   … we could look at a more finegrained inheritance tree if there
   is interest
   Jan-Ivar: this is a reasonable problem to solve; I have some
   concerns with the API surface
   … since focus switching is global to the user, it doesn't need
   to be on a mediastreamtrack subclass
   … it could live e.g. on navigator.mediaDevices
   … I think a microtask is too narrow - we should queue a task
   instead, this would give the same presentation
   … Without having received a frame, how can app determine
   whether to switch or not?
   Elad: getSettings() on the captured stream can tell you the
   kind of display surface
   … checking the content of a frame is likely challenging to get
   right in any case
   … looking just at the metadata is easier
   … re global vs mediastreamtrack, it was partly to protect
   against attacks based on cloning - but happy to look more into
   alternatives
   … task vs microtask - can you say more about your concerns
   about shim-ability?
   Jan-Ivar: it's a general principle, and I'm not sure the
   advantages of a microtask in the first place
   Elad: part of it was a concern of backwards compatibility and
   performance
   Jan-Ivar: I think track & microtask can both address these
   aspects
   … in any case, my main concern is where the API lives at the
   moment
   Youenn: cloning of tracks is known; when you subtype tracks, it
   starts to be messy
   … what type would be assigned to a cloned track?
   … we should avoid subtypes if possible
   … mitigations of 1s and against busy-looping sound good
   … I need to think more about the 1s delay
   Harald: re cloning and MST subtracks - we have one case like
   that, and I think we should change it
   … we have 2 options: subclassing or making the method returns
   an error
   … I don't think JS dev care one way or another
   … subclassing feels a bit tidier
   Elad: the goal was to reflect our design in the class hierarchy
   indeed
   Youenn: to get there, I think we should first list the use
   cases where subtypes actually help - just one method feels not
   enough to consider changing clone()
   Elad: 3 methods would fit: captureHandler, @@@ only apply to
   captured media
   Jan-Ivar: I'm opposed to subclassing - I think that API should
   live in a global space e.g. navigator.mediaDevices.focus
   Harald: where will that written up? I would like to see the
   argument in more details
   Elad: I'm hearing interest in the API
   Jan-Ivar: interested in solving the problem with a slightly
   different shape
   Youenn: +1 on a different shape, and discussion on the 1s
   delay; but sounds like a good space to work on
   [clarification on the 1s requirement makes Youenn happy]
  getViewportMedia
   [16]getViewportMedia(): Let pages opt-in to capture #155
     [16] https://github.com/w3c/mediacapture-screen-share/issues/155
   Elad: getViewportMedia is an API allowing to capture the
   current viewport (what is visible in the tab launching the API
   call)
   … equivalent of calling getDisplayMedia and selecting the
   current tab
   … there is danger associated with self-capture
   … to protect against this, we're requiring
   crossOriginIsolation, opt-in via a header (most likely document
   policy, but to-be-confirmed)
   … and only available to top-level docs or privileged iframes
   … Jan-Ivar and I have been discussing a lot and have converged
   on a number of proposals as summarized in the slide
   Jan-Ivar: we're proposing that getViewportMedia would capture
   the entire viewport when called from an iframe
   … and we're proposing using Document Policy with names built on
   "viewport-capture"
   … the first proposal is basically deferring the approach to
   cropping to later
   Resolution: getViewportMedia capture the full viewport when
   called from an iframe
   Harald: re "viewport-capture", is it aligned with the naming
   convention of Document Policy?
   Tim: just noting the two decisions (iframe capturing the full
   viewport, and naming) are linked
   Resolution: use viewport-capture as naming basis for Document
   Policy of getViewportMedia
   Harald: these will be confirmed on the mailing list
   Elad: I also intend to suggest a cropping API that might
   complement getViewMedia in the upcoming months
   Jan-Ivar: getViewportMedia should require user activation
   Dom: +1
   Elad: I can imagine certain cases where use activation makes
   sense, but others where less so
   … e.g. if you open a new tab
   Youenn: this feels like a general problem for user activation
   that is worth discussing in general
   … but given that this is privileged API, user activation feels
   like a must
   Dom: +1 on solving it generically for user activation unless we
   can demonstrate something specific to capturing
   Youenn: note that changing user activation rules is really
   hard, so we need to get our answer right before shipping
   jan-ivar: removing user activation shouldn't as hard as adding
   it afterwards
   Elad: I would want more time to make a decision on that
   particular bit
  Display surface constraint
   [17]Revisit: Let getDisplayMedia() influence the default type
   choice in the picker #184
     [17] https://github.com/w3c/mediacapture-screen-share/issues/184
   Elad: getDisplayMedia doesn't let influence user's choice
   … user's choice is already being influenced though, by virtue
   of having a 1st item in the list of choices
   … Chrome has Screen-first
   … Safari has only choice (so a major influence)
   … FF is evolving
   … Influence could be wielded positively - towards the safer
   choice, or the more relevant one
   … a lot of Web developers have expressed interest in allowing
   influence or limit user's choice:
   … - save clicks (if the app knows they only want tab, or only
   want windows)
   … - apps want to capture audio - only available on a subset of
   capture sources
   … - tabs provide higher FPS
   … - the app knows from context - e.g. allowing to favor slides
   over other content when doing a presentation
   … - avoid risk with over sharing
   … The proposal I'm making is to add a hint as part of the
   contraints, e.g. "ideal: browser"
   … the user agent may choose how to apply that hint - from using
   it to prioritize, to ignoring it or adding warnings in case the
   UA determines it's not safe to apply the hint
   … [showing the specific text proposal in #184]
   … all other contraints are still processed after the user made
   their choice, only that one gets processed before
   … it's only a hint, it cannot limit user's choice
   … e.g. Chrome would show the list of tabs in preference when
   "browser" is hinted
   Jan-Ivar: in the github discussion, we mentioned additional
   mitigations - e.g. not listing the requesting tab/window in the
   list of tabs
   … would like to see some of these ideas reflected in the text
   … min & exact constraints are disallowed in gDM, so it would
   have to be "ideal"
   … I think it makes sense to use a hint to steer these selectors
   UI
   … for clarification, "influence/limiting" requirements
   discussed earlier were about the app, not the user agent
   Harald: re removing the calling tab, would it be only for this
   usage of the hint, or any use of gDM?
   Jan-Ivar: I think they need to be considered before we add this
   Elad: my recollection was we would encourage the UA to warn of
   risks of self-capture rather than removing the option
   altogether
   … there are other ways of adding friction that doesn't require
   removing the option completely
   … removing it completely might create risks of oversharing via
   sharing of the entire screen
   Jan-Ivar: I think we can probably converge on mitigations for
   self-capture
   … ideally, I would like normative language
   Youenn: should we allow a hint for capturing the entire screen?
   that's the riskiest
   … let's focus on hinting towards capturing less
   … In general, I dislike constraints - can we add a dedicated
   parameter instead of reusing the contraints syntax?
   … this may open further extensibility down the line (e.g.
   highlight tabs from a given origin?)
   … can you share more about Chrome's plans in terms of
   mitigations against self-capture and its dangers?
   Elad: we haven't prototyped the warning mechanism yet
   … re constraints, I have no objection to using a parameter
   instead of constraints
   … re removing "screen" - it's interesting, but if that is the
   default when no hint is given, this isn't really helping
   Youenn: that default behavior is specific to Chrome
   … Safari only allows screen, but we will have a picker at some
   point where screen won't be the default
   … and I don't think apps should have a way to default to screen
   Jan-Ivar: FF already doesn't default to screen, and +1 to
   youenn of not allowing (or just ignoring) screen as a
   constraint
   Elad: the user agent would already be free to ignore the hint
   … for Chromium, getting visibility on dev's intent would be
   useful in migrating away from that default
   Bernard: in terms of the requests from developers, is audio
   capture only avaiable on screen?
   Elad: no, it's available on tab, and screen on windows
   Bernard: re high-FPS capture - is that typically tab?
   Elad: in Chromium, yes
   … but it's in general, a way for developers to steer toward
   what they know will work for their use cases
   Bernard: is "screen"-level capturing key to any of these
   requests?
   Elad: right; but note that "screen" could be used to capture
   from a different monitor
   Jan-Ivar: but all monitors are dangerous
   Elad: so I'm hearing support except for the the screen-hint
   TimP: I dislike heuristics-based picker - it makes it a
   nightmare to test and makes everything unpredictable
   Elad: the mention for heuristics was for apps to use, not the
   UA
   Jan-Ivar: supporting, but with stronger language on warnings
   for self-capture
  Echo Cancellation
   [18]Echo cancellation: Need to specify the source of the echo
   cancellation reference signal #31
     [18] https://github.com/w3c/mediacapture-extensions/issues/31
   [19]Specify constraint echoCancellationReferenceSinkId #32
     [19] https://github.com/w3c/mediacapture-extensions/pull/32
   Harald: this is a request coming from our audio team
   … echo cancellation is about removing the audio picked up by
   the microphone in the room to keep only the audio generated
   *in* the room
   … it's in general complicated - a complicated part is knowing
   what to remove
   … current implementation in Chrome just looks at what's coming
   it via the peerconnection
   … this has proven insufficient and we want to revise this
   … if we want to remove audio output, you can hit issues with
   specific headphones or setups
   … from the application perspective, you want to identify what
   output has been used that is most relevant to echo cancellation
   and feed that to the algorithm
   … to keep it simple, we have an enumaration of output devices
   via sinkIds
   … the proposal is to re-use this sinkid in the contraint for
   echo cancellation
   TimP: +1 to do something in this space
   … will it help if you mix WebAudio in?
   … i.e. when the audio output comes from WebAudio processing
   Harald: yes, it should cover this (as long as the output makes
   it to the speaker)
   Jan-Ivar: Mozilla doesn't believe this API is needed to do
   correct echo cancellation
   … why does the UA needs JS input on this? The UA already know
   which headset is being used
   … it's not clear what getting input from the app is useful here
   Harald: which audio output is currently used by the echo
   cancellation?
   Jan-Ivar: I believe we have access to the rendered output (incl
   out of WebAudio)
   … Paul Adenot is our key person on this
   Harald: would like his opinion on the headcase
   Youenn: +1 to Jan-ivar - the UA should already have access to
   the all info it needs
   … and it has more info that apps would have on this
   bernard: Harald, you said chrome currently uses sum of all
   audio outputs from peerconnection
   … is the intent here to improve the chromium implementation or
   to let them do better echo cancellation?
   harald: this is not for app-based echo cancellation
   bernard: I've heard requests from apps to do have an adjustable
   echo cancellation - e.g. an echo cancellation transform stream
   Harald: that is orthogonal to this proposal
   … echo cancellation can't be modeled as a transform stream:
   it's a 2 input objects
   … it can be modeled as process that takes 2 audio inputs
   youenn: you could still do 1 input / 1 output with an
   additional parameter
   … in the transform stream creation with the reference stream
   Harald: interesting thing to do, but not this proposal
   TimP: there are situations where you don't want to cancel part
   of the stream being output - e.g. background music
   … with the room accoustics
   … maybe a rare use case, but one we've stumbled upon it for
   immersiveness
   harald: you could turn echo cancellation off?
   timP: but that generates other issues
   Sergio: I don't think this proposal would help solve the Chrome
   issue
   … there are 3 different issues being discussed: echo
   cancellation in Chrome, new echo cancellation tuning use cases
   (that would need clarification/refinement), and exposing echo
   cancellation separately from WebRTC (maybe in Web Audio)
   Harald: I'm hearing opposition to making an API of the specific
   proposal because the UA should be able to figure it out
   … I find it interesting that only browser output should be
   cancelled - if you have another app than the browser producing
   audio, shouldn't it be removed too?
   Jan-Ivar: RNNoise has been exploring some of this; but
   echoCancellation: true is likely focused on the meeting use
   case
   Youenn: the OS can also provide user-configurable echo
   cancellation styles
   Guido: the motivation for Chrome is to help figure which of the
   output devices should be used as the reference signal for echo
   cancellation
   … if there are several audio output devices with one being
   preferred by the app
   Harald: I'd like to invite comments on the issue on whether
   this API is needed or not
   … I haven't seen much comments on the shape of the API
   … if we were to conclude there was such a need, this API may be
   OK
   … but no consensus on the need for such an API
  Wrapping up
   Bernard: any CfC needed based on our discussions?
   Jan-Ivar: re getViewportMedia, should we put this in a new doc
   or an existing one?
   Dom: having a single document couple their process progress
   elad: also keeping them separate helps making clear how
   distinct they are
   youenn: it also helps in terms of separating the test cases in
   different folders
   harald: sounds like convergence towards a separate spec
   jan-ivar: would still prefer a single doc
  October meeting
   Bernard: next meeting will be devoted to mediacapture-transform
   - proposed content and agenda was shared on the list
   [20]Preview of October Virtual Interim slide deck
     [20]
https://lists.w3.org/Archives/Public/public-webrtc/2021Sep/0030.html
   Bernard: there is overlap between mediacapture-transform and
   WHATWG streams issues
   Youenn: I will try to mark more explicitly issues in MC-T that
   are linked to WHATWG streams
   Bernard: part of what I thought might be useful to hear is
   where these upstream WHATWG stream issues are on the roadmap
   (if at all)
   Jan-Ivar: the new proposal we want to present is streams-based,
   but improvements over the existing one
   … still needs some fixes in WHATWG streams
   … I have linked demos in the slides for some of the issues
   we're trying to address
   TimP: it would be good to start these presentations with use
   cases to scope our discussions
   Jan-Ivar: the slides Youenn and I developed includes goals of
   the proposals
   Harald: Media Capture Transform starts with use cases
   Bernard: Streams have been adopted to use streams to manage
   pipelines
   Youenn: please send early feedback on the proposals
Summary of resolutions
    1. [21]getViewportMedia capture the full viewport when called
       from an iframe
    2. [22]use viewport-capture as naming basis for Document
       Policy of getViewportMedia
    Minutes manually created (not a transcript), formatted by
    [23]scribe.perl version 136 (Thu May 27 13:50:24 2021 UTC).
     [23] https://w3c.github.io/scribe2/scribedoc.html
Received on Monday, 20 September 2021 17:06:49 UTC