- From: Dominique Hazael-Massieux <dom@w3.org>
- Date: Mon, 20 Sep 2021 19:06:44 +0200
- To: "public-webrtc@w3.org" <public-webrtc@w3.org>
Hi, The minutes of our meeting held today (September 20, 2021) are available at: https://www.w3.org/2021/09/20-webrtc-minutes.html and copied as text below. Dom WebRTC September 2021 virtual interim 20 September 2021 [2]Agenda. [3]IRC log. [2] https://www.w3.org/2011/04/webrtc/wiki/September_20_2021#WebRTC_WG_Virtual_Interim [3] https://www.w3.org/2021/09/20-webrtc-irc Attendees Present ArneSchramm, BenWagner, BernardA, BrianBaldino, Carine, Dom, EladAlon, GuidoUrdaneta, Harald, Jan-Ivar, SergioMurillo, SongXu, ThomasGuilbert, TimPanton, TonyHerre, YouennFablet Regrets - Chair Bernard, Harald, Jan-Ivar Scribe dom Contents 1. [4]Next meetings 2. [5]Status of recent CfCs 3. [6]WHATWG Streams 4. [7]Agenda review 5. [8]Conditional Focus 6. [9]getViewportMedia 7. [10]Display surface constraint 8. [11]Echo Cancellation 9. [12]Wrapping up 10. [13]October meeting 11. [14]Summary of resolutions Meeting minutes [15]Slides [15] https://www.w3.org/2011/04/webrtc/wiki/images/8/86/WEBRTCWG-2021-09-20.pdf Next meetings Bernard: October VI to be scheduled 1st week of October - Doodle poll open till nex tweek … then TPAC meetings (joint & solos) Status of recent CfCs Bernard: Republishing media capture and streams as CR - completed positively on Sep 17 … Jan-Ivar will summarize the chairs decision on it … Another CfC on Transferrable MediaStreamTracks running until Sep 27 … our next meeting in October will build on this WHATWG Streams Bernard: we have potential dependencies to WHATWG streams … a number of discussions in their repo relate to issues we've discussed in terms of our media processing pipelines Agenda review Bernard: main topics: Conditional focus, getViewportMedia, Display surface contraints, echo cancellation Conditional Focus Elad: depending on use cases, switching the focus from the browser to the captured window makes more or less sense … focus control is an important part of the user experience, given that making a presentation can be stressful … e.g. if you're capturing a window where you're writing text, focus needs to be there … but there are situations where the browser can be used directly to control to the captured window … the challenge is that the browser cannot determine one situation from another … when the capturing application has a lot more situational awareness … not necessarily complete knowledge, but at least some … I'm proposing an API that associates stream capture with the ability to give a specific limited focus switch opportunity … to the capturing application … because this is done right after the capture is starting (although before a frame is being catpured), the capturing application has all the context it can get to make its decision … the idea is to gives that focus-switching opportunity in a microtask in a promise resolution of the capture request … the proposal includes a number of mitigations (e.g. a 1s timeout) to avoid risks of focus-switching attacks … the particular API I'm proposing is exposed via a method on a subcall of MediaStreamTrack - that way it's only available when obtained through a captured tab or window … we could look at a more finegrained inheritance tree if there is interest Jan-Ivar: this is a reasonable problem to solve; I have some concerns with the API surface … since focus switching is global to the user, it doesn't need to be on a mediastreamtrack subclass … it could live e.g. on navigator.mediaDevices … I think a microtask is too narrow - we should queue a task instead, this would give the same presentation … Without having received a frame, how can app determine whether to switch or not? Elad: getSettings() on the captured stream can tell you the kind of display surface … checking the content of a frame is likely challenging to get right in any case … looking just at the metadata is easier … re global vs mediastreamtrack, it was partly to protect against attacks based on cloning - but happy to look more into alternatives … task vs microtask - can you say more about your concerns about shim-ability? Jan-Ivar: it's a general principle, and I'm not sure the advantages of a microtask in the first place Elad: part of it was a concern of backwards compatibility and performance Jan-Ivar: I think track & microtask can both address these aspects … in any case, my main concern is where the API lives at the moment Youenn: cloning of tracks is known; when you subtype tracks, it starts to be messy … what type would be assigned to a cloned track? … we should avoid subtypes if possible … mitigations of 1s and against busy-looping sound good … I need to think more about the 1s delay Harald: re cloning and MST subtracks - we have one case like that, and I think we should change it … we have 2 options: subclassing or making the method returns an error … I don't think JS dev care one way or another … subclassing feels a bit tidier Elad: the goal was to reflect our design in the class hierarchy indeed Youenn: to get there, I think we should first list the use cases where subtypes actually help - just one method feels not enough to consider changing clone() Elad: 3 methods would fit: captureHandler, @@@ only apply to captured media Jan-Ivar: I'm opposed to subclassing - I think that API should live in a global space e.g. navigator.mediaDevices.focus Harald: where will that written up? I would like to see the argument in more details Elad: I'm hearing interest in the API Jan-Ivar: interested in solving the problem with a slightly different shape Youenn: +1 on a different shape, and discussion on the 1s delay; but sounds like a good space to work on [clarification on the 1s requirement makes Youenn happy] getViewportMedia [16]getViewportMedia(): Let pages opt-in to capture #155 [16] https://github.com/w3c/mediacapture-screen-share/issues/155 Elad: getViewportMedia is an API allowing to capture the current viewport (what is visible in the tab launching the API call) … equivalent of calling getDisplayMedia and selecting the current tab … there is danger associated with self-capture … to protect against this, we're requiring crossOriginIsolation, opt-in via a header (most likely document policy, but to-be-confirmed) … and only available to top-level docs or privileged iframes … Jan-Ivar and I have been discussing a lot and have converged on a number of proposals as summarized in the slide Jan-Ivar: we're proposing that getViewportMedia would capture the entire viewport when called from an iframe … and we're proposing using Document Policy with names built on "viewport-capture" … the first proposal is basically deferring the approach to cropping to later Resolution: getViewportMedia capture the full viewport when called from an iframe Harald: re "viewport-capture", is it aligned with the naming convention of Document Policy? Tim: just noting the two decisions (iframe capturing the full viewport, and naming) are linked Resolution: use viewport-capture as naming basis for Document Policy of getViewportMedia Harald: these will be confirmed on the mailing list Elad: I also intend to suggest a cropping API that might complement getViewMedia in the upcoming months Jan-Ivar: getViewportMedia should require user activation Dom: +1 Elad: I can imagine certain cases where use activation makes sense, but others where less so … e.g. if you open a new tab Youenn: this feels like a general problem for user activation that is worth discussing in general … but given that this is privileged API, user activation feels like a must Dom: +1 on solving it generically for user activation unless we can demonstrate something specific to capturing Youenn: note that changing user activation rules is really hard, so we need to get our answer right before shipping jan-ivar: removing user activation shouldn't as hard as adding it afterwards Elad: I would want more time to make a decision on that particular bit Display surface constraint [17]Revisit: Let getDisplayMedia() influence the default type choice in the picker #184 [17] https://github.com/w3c/mediacapture-screen-share/issues/184 Elad: getDisplayMedia doesn't let influence user's choice … user's choice is already being influenced though, by virtue of having a 1st item in the list of choices … Chrome has Screen-first … Safari has only choice (so a major influence) … FF is evolving … Influence could be wielded positively - towards the safer choice, or the more relevant one … a lot of Web developers have expressed interest in allowing influence or limit user's choice: … - save clicks (if the app knows they only want tab, or only want windows) … - apps want to capture audio - only available on a subset of capture sources … - tabs provide higher FPS … - the app knows from context - e.g. allowing to favor slides over other content when doing a presentation … - avoid risk with over sharing … The proposal I'm making is to add a hint as part of the contraints, e.g. "ideal: browser" … the user agent may choose how to apply that hint - from using it to prioritize, to ignoring it or adding warnings in case the UA determines it's not safe to apply the hint … [showing the specific text proposal in #184] … all other contraints are still processed after the user made their choice, only that one gets processed before … it's only a hint, it cannot limit user's choice … e.g. Chrome would show the list of tabs in preference when "browser" is hinted Jan-Ivar: in the github discussion, we mentioned additional mitigations - e.g. not listing the requesting tab/window in the list of tabs … would like to see some of these ideas reflected in the text … min & exact constraints are disallowed in gDM, so it would have to be "ideal" … I think it makes sense to use a hint to steer these selectors UI … for clarification, "influence/limiting" requirements discussed earlier were about the app, not the user agent Harald: re removing the calling tab, would it be only for this usage of the hint, or any use of gDM? Jan-Ivar: I think they need to be considered before we add this Elad: my recollection was we would encourage the UA to warn of risks of self-capture rather than removing the option altogether … there are other ways of adding friction that doesn't require removing the option completely … removing it completely might create risks of oversharing via sharing of the entire screen Jan-Ivar: I think we can probably converge on mitigations for self-capture … ideally, I would like normative language Youenn: should we allow a hint for capturing the entire screen? that's the riskiest … let's focus on hinting towards capturing less … In general, I dislike constraints - can we add a dedicated parameter instead of reusing the contraints syntax? … this may open further extensibility down the line (e.g. highlight tabs from a given origin?) … can you share more about Chrome's plans in terms of mitigations against self-capture and its dangers? Elad: we haven't prototyped the warning mechanism yet … re constraints, I have no objection to using a parameter instead of constraints … re removing "screen" - it's interesting, but if that is the default when no hint is given, this isn't really helping Youenn: that default behavior is specific to Chrome … Safari only allows screen, but we will have a picker at some point where screen won't be the default … and I don't think apps should have a way to default to screen Jan-Ivar: FF already doesn't default to screen, and +1 to youenn of not allowing (or just ignoring) screen as a constraint Elad: the user agent would already be free to ignore the hint … for Chromium, getting visibility on dev's intent would be useful in migrating away from that default Bernard: in terms of the requests from developers, is audio capture only avaiable on screen? Elad: no, it's available on tab, and screen on windows Bernard: re high-FPS capture - is that typically tab? Elad: in Chromium, yes … but it's in general, a way for developers to steer toward what they know will work for their use cases Bernard: is "screen"-level capturing key to any of these requests? Elad: right; but note that "screen" could be used to capture from a different monitor Jan-Ivar: but all monitors are dangerous Elad: so I'm hearing support except for the the screen-hint TimP: I dislike heuristics-based picker - it makes it a nightmare to test and makes everything unpredictable Elad: the mention for heuristics was for apps to use, not the UA Jan-Ivar: supporting, but with stronger language on warnings for self-capture Echo Cancellation [18]Echo cancellation: Need to specify the source of the echo cancellation reference signal #31 [18] https://github.com/w3c/mediacapture-extensions/issues/31 [19]Specify constraint echoCancellationReferenceSinkId #32 [19] https://github.com/w3c/mediacapture-extensions/pull/32 Harald: this is a request coming from our audio team … echo cancellation is about removing the audio picked up by the microphone in the room to keep only the audio generated *in* the room … it's in general complicated - a complicated part is knowing what to remove … current implementation in Chrome just looks at what's coming it via the peerconnection … this has proven insufficient and we want to revise this … if we want to remove audio output, you can hit issues with specific headphones or setups … from the application perspective, you want to identify what output has been used that is most relevant to echo cancellation and feed that to the algorithm … to keep it simple, we have an enumaration of output devices via sinkIds … the proposal is to re-use this sinkid in the contraint for echo cancellation TimP: +1 to do something in this space … will it help if you mix WebAudio in? … i.e. when the audio output comes from WebAudio processing Harald: yes, it should cover this (as long as the output makes it to the speaker) Jan-Ivar: Mozilla doesn't believe this API is needed to do correct echo cancellation … why does the UA needs JS input on this? The UA already know which headset is being used … it's not clear what getting input from the app is useful here Harald: which audio output is currently used by the echo cancellation? Jan-Ivar: I believe we have access to the rendered output (incl out of WebAudio) … Paul Adenot is our key person on this Harald: would like his opinion on the headcase Youenn: +1 to Jan-ivar - the UA should already have access to the all info it needs … and it has more info that apps would have on this bernard: Harald, you said chrome currently uses sum of all audio outputs from peerconnection … is the intent here to improve the chromium implementation or to let them do better echo cancellation? harald: this is not for app-based echo cancellation bernard: I've heard requests from apps to do have an adjustable echo cancellation - e.g. an echo cancellation transform stream Harald: that is orthogonal to this proposal … echo cancellation can't be modeled as a transform stream: it's a 2 input objects … it can be modeled as process that takes 2 audio inputs youenn: you could still do 1 input / 1 output with an additional parameter … in the transform stream creation with the reference stream Harald: interesting thing to do, but not this proposal TimP: there are situations where you don't want to cancel part of the stream being output - e.g. background music … with the room accoustics … maybe a rare use case, but one we've stumbled upon it for immersiveness harald: you could turn echo cancellation off? timP: but that generates other issues Sergio: I don't think this proposal would help solve the Chrome issue … there are 3 different issues being discussed: echo cancellation in Chrome, new echo cancellation tuning use cases (that would need clarification/refinement), and exposing echo cancellation separately from WebRTC (maybe in Web Audio) Harald: I'm hearing opposition to making an API of the specific proposal because the UA should be able to figure it out … I find it interesting that only browser output should be cancelled - if you have another app than the browser producing audio, shouldn't it be removed too? Jan-Ivar: RNNoise has been exploring some of this; but echoCancellation: true is likely focused on the meeting use case Youenn: the OS can also provide user-configurable echo cancellation styles Guido: the motivation for Chrome is to help figure which of the output devices should be used as the reference signal for echo cancellation … if there are several audio output devices with one being preferred by the app Harald: I'd like to invite comments on the issue on whether this API is needed or not … I haven't seen much comments on the shape of the API … if we were to conclude there was such a need, this API may be OK … but no consensus on the need for such an API Wrapping up Bernard: any CfC needed based on our discussions? Jan-Ivar: re getViewportMedia, should we put this in a new doc or an existing one? Dom: having a single document couple their process progress elad: also keeping them separate helps making clear how distinct they are youenn: it also helps in terms of separating the test cases in different folders harald: sounds like convergence towards a separate spec jan-ivar: would still prefer a single doc October meeting Bernard: next meeting will be devoted to mediacapture-transform - proposed content and agenda was shared on the list [20]Preview of October Virtual Interim slide deck [20] https://lists.w3.org/Archives/Public/public-webrtc/2021Sep/0030.html Bernard: there is overlap between mediacapture-transform and WHATWG streams issues Youenn: I will try to mark more explicitly issues in MC-T that are linked to WHATWG streams Bernard: part of what I thought might be useful to hear is where these upstream WHATWG stream issues are on the roadmap (if at all) Jan-Ivar: the new proposal we want to present is streams-based, but improvements over the existing one … still needs some fixes in WHATWG streams … I have linked demos in the slides for some of the issues we're trying to address TimP: it would be good to start these presentations with use cases to scope our discussions Jan-Ivar: the slides Youenn and I developed includes goals of the proposals Harald: Media Capture Transform starts with use cases Bernard: Streams have been adopted to use streams to manage pipelines Youenn: please send early feedback on the proposals Summary of resolutions 1. [21]getViewportMedia capture the full viewport when called from an iframe 2. [22]use viewport-capture as naming basis for Document Policy of getViewportMedia Minutes manually created (not a transcript), formatted by [23]scribe.perl version 136 (Thu May 27 13:50:24 2021 UTC). [23] https://w3c.github.io/scribe2/scribedoc.html
Received on Monday, 20 September 2021 17:06:49 UTC