- From: Dominique Hazael-Massieux <dom@w3.org>
- Date: Mon, 20 Sep 2021 19:06:44 +0200
- To: "public-webrtc@w3.org" <public-webrtc@w3.org>
Hi,
The minutes of our meeting held today (September 20, 2021) are available at:
https://www.w3.org/2021/09/20-webrtc-minutes.html
and copied as text below.
Dom
WebRTC September 2021 virtual interim
20 September 2021
[2]Agenda. [3]IRC log.
[2]
https://www.w3.org/2011/04/webrtc/wiki/September_20_2021#WebRTC_WG_Virtual_Interim
[3] https://www.w3.org/2021/09/20-webrtc-irc
Attendees
Present
ArneSchramm, BenWagner, BernardA, BrianBaldino, Carine,
Dom, EladAlon, GuidoUrdaneta, Harald, Jan-Ivar,
SergioMurillo, SongXu, ThomasGuilbert, TimPanton,
TonyHerre, YouennFablet
Regrets
-
Chair
Bernard, Harald, Jan-Ivar
Scribe
dom
Contents
1. [4]Next meetings
2. [5]Status of recent CfCs
3. [6]WHATWG Streams
4. [7]Agenda review
5. [8]Conditional Focus
6. [9]getViewportMedia
7. [10]Display surface constraint
8. [11]Echo Cancellation
9. [12]Wrapping up
10. [13]October meeting
11. [14]Summary of resolutions
Meeting minutes
[15]Slides
[15]
https://www.w3.org/2011/04/webrtc/wiki/images/8/86/WEBRTCWG-2021-09-20.pdf
Next meetings
Bernard: October VI to be scheduled 1st week of October -
Doodle poll open till nex tweek
… then TPAC meetings (joint & solos)
Status of recent CfCs
Bernard: Republishing media capture and streams as CR -
completed positively on Sep 17
… Jan-Ivar will summarize the chairs decision on it
… Another CfC on Transferrable MediaStreamTracks running until
Sep 27
… our next meeting in October will build on this
WHATWG Streams
Bernard: we have potential dependencies to WHATWG streams
… a number of discussions in their repo relate to issues we've
discussed in terms of our media processing pipelines
Agenda review
Bernard: main topics: Conditional focus, getViewportMedia,
Display surface contraints, echo cancellation
Conditional Focus
Elad: depending on use cases, switching the focus from the
browser to the captured window makes more or less sense
… focus control is an important part of the user experience,
given that making a presentation can be stressful
… e.g. if you're capturing a window where you're writing text,
focus needs to be there
… but there are situations where the browser can be used
directly to control to the captured window
… the challenge is that the browser cannot determine one
situation from another
… when the capturing application has a lot more situational
awareness
… not necessarily complete knowledge, but at least some
… I'm proposing an API that associates stream capture with the
ability to give a specific limited focus switch opportunity
… to the capturing application
… because this is done right after the capture is starting
(although before a frame is being catpured), the capturing
application has all the context it can get to make its decision
… the idea is to gives that focus-switching opportunity in a
microtask in a promise resolution of the capture request
… the proposal includes a number of mitigations (e.g. a 1s
timeout) to avoid risks of focus-switching attacks
… the particular API I'm proposing is exposed via a method on a
subcall of MediaStreamTrack - that way it's only available when
obtained through a captured tab or window
… we could look at a more finegrained inheritance tree if there
is interest
Jan-Ivar: this is a reasonable problem to solve; I have some
concerns with the API surface
… since focus switching is global to the user, it doesn't need
to be on a mediastreamtrack subclass
… it could live e.g. on navigator.mediaDevices
… I think a microtask is too narrow - we should queue a task
instead, this would give the same presentation
… Without having received a frame, how can app determine
whether to switch or not?
Elad: getSettings() on the captured stream can tell you the
kind of display surface
… checking the content of a frame is likely challenging to get
right in any case
… looking just at the metadata is easier
… re global vs mediastreamtrack, it was partly to protect
against attacks based on cloning - but happy to look more into
alternatives
… task vs microtask - can you say more about your concerns
about shim-ability?
Jan-Ivar: it's a general principle, and I'm not sure the
advantages of a microtask in the first place
Elad: part of it was a concern of backwards compatibility and
performance
Jan-Ivar: I think track & microtask can both address these
aspects
… in any case, my main concern is where the API lives at the
moment
Youenn: cloning of tracks is known; when you subtype tracks, it
starts to be messy
… what type would be assigned to a cloned track?
… we should avoid subtypes if possible
… mitigations of 1s and against busy-looping sound good
… I need to think more about the 1s delay
Harald: re cloning and MST subtracks - we have one case like
that, and I think we should change it
… we have 2 options: subclassing or making the method returns
an error
… I don't think JS dev care one way or another
… subclassing feels a bit tidier
Elad: the goal was to reflect our design in the class hierarchy
indeed
Youenn: to get there, I think we should first list the use
cases where subtypes actually help - just one method feels not
enough to consider changing clone()
Elad: 3 methods would fit: captureHandler, @@@ only apply to
captured media
Jan-Ivar: I'm opposed to subclassing - I think that API should
live in a global space e.g. navigator.mediaDevices.focus
Harald: where will that written up? I would like to see the
argument in more details
Elad: I'm hearing interest in the API
Jan-Ivar: interested in solving the problem with a slightly
different shape
Youenn: +1 on a different shape, and discussion on the 1s
delay; but sounds like a good space to work on
[clarification on the 1s requirement makes Youenn happy]
getViewportMedia
[16]getViewportMedia(): Let pages opt-in to capture #155
[16] https://github.com/w3c/mediacapture-screen-share/issues/155
Elad: getViewportMedia is an API allowing to capture the
current viewport (what is visible in the tab launching the API
call)
… equivalent of calling getDisplayMedia and selecting the
current tab
… there is danger associated with self-capture
… to protect against this, we're requiring
crossOriginIsolation, opt-in via a header (most likely document
policy, but to-be-confirmed)
… and only available to top-level docs or privileged iframes
… Jan-Ivar and I have been discussing a lot and have converged
on a number of proposals as summarized in the slide
Jan-Ivar: we're proposing that getViewportMedia would capture
the entire viewport when called from an iframe
… and we're proposing using Document Policy with names built on
"viewport-capture"
… the first proposal is basically deferring the approach to
cropping to later
Resolution: getViewportMedia capture the full viewport when
called from an iframe
Harald: re "viewport-capture", is it aligned with the naming
convention of Document Policy?
Tim: just noting the two decisions (iframe capturing the full
viewport, and naming) are linked
Resolution: use viewport-capture as naming basis for Document
Policy of getViewportMedia
Harald: these will be confirmed on the mailing list
Elad: I also intend to suggest a cropping API that might
complement getViewMedia in the upcoming months
Jan-Ivar: getViewportMedia should require user activation
Dom: +1
Elad: I can imagine certain cases where use activation makes
sense, but others where less so
… e.g. if you open a new tab
Youenn: this feels like a general problem for user activation
that is worth discussing in general
… but given that this is privileged API, user activation feels
like a must
Dom: +1 on solving it generically for user activation unless we
can demonstrate something specific to capturing
Youenn: note that changing user activation rules is really
hard, so we need to get our answer right before shipping
jan-ivar: removing user activation shouldn't as hard as adding
it afterwards
Elad: I would want more time to make a decision on that
particular bit
Display surface constraint
[17]Revisit: Let getDisplayMedia() influence the default type
choice in the picker #184
[17] https://github.com/w3c/mediacapture-screen-share/issues/184
Elad: getDisplayMedia doesn't let influence user's choice
… user's choice is already being influenced though, by virtue
of having a 1st item in the list of choices
… Chrome has Screen-first
… Safari has only choice (so a major influence)
… FF is evolving
… Influence could be wielded positively - towards the safer
choice, or the more relevant one
… a lot of Web developers have expressed interest in allowing
influence or limit user's choice:
… - save clicks (if the app knows they only want tab, or only
want windows)
… - apps want to capture audio - only available on a subset of
capture sources
… - tabs provide higher FPS
… - the app knows from context - e.g. allowing to favor slides
over other content when doing a presentation
… - avoid risk with over sharing
… The proposal I'm making is to add a hint as part of the
contraints, e.g. "ideal: browser"
… the user agent may choose how to apply that hint - from using
it to prioritize, to ignoring it or adding warnings in case the
UA determines it's not safe to apply the hint
… [showing the specific text proposal in #184]
… all other contraints are still processed after the user made
their choice, only that one gets processed before
… it's only a hint, it cannot limit user's choice
… e.g. Chrome would show the list of tabs in preference when
"browser" is hinted
Jan-Ivar: in the github discussion, we mentioned additional
mitigations - e.g. not listing the requesting tab/window in the
list of tabs
… would like to see some of these ideas reflected in the text
… min & exact constraints are disallowed in gDM, so it would
have to be "ideal"
… I think it makes sense to use a hint to steer these selectors
UI
… for clarification, "influence/limiting" requirements
discussed earlier were about the app, not the user agent
Harald: re removing the calling tab, would it be only for this
usage of the hint, or any use of gDM?
Jan-Ivar: I think they need to be considered before we add this
Elad: my recollection was we would encourage the UA to warn of
risks of self-capture rather than removing the option
altogether
… there are other ways of adding friction that doesn't require
removing the option completely
… removing it completely might create risks of oversharing via
sharing of the entire screen
Jan-Ivar: I think we can probably converge on mitigations for
self-capture
… ideally, I would like normative language
Youenn: should we allow a hint for capturing the entire screen?
that's the riskiest
… let's focus on hinting towards capturing less
… In general, I dislike constraints - can we add a dedicated
parameter instead of reusing the contraints syntax?
… this may open further extensibility down the line (e.g.
highlight tabs from a given origin?)
… can you share more about Chrome's plans in terms of
mitigations against self-capture and its dangers?
Elad: we haven't prototyped the warning mechanism yet
… re constraints, I have no objection to using a parameter
instead of constraints
… re removing "screen" - it's interesting, but if that is the
default when no hint is given, this isn't really helping
Youenn: that default behavior is specific to Chrome
… Safari only allows screen, but we will have a picker at some
point where screen won't be the default
… and I don't think apps should have a way to default to screen
Jan-Ivar: FF already doesn't default to screen, and +1 to
youenn of not allowing (or just ignoring) screen as a
constraint
Elad: the user agent would already be free to ignore the hint
… for Chromium, getting visibility on dev's intent would be
useful in migrating away from that default
Bernard: in terms of the requests from developers, is audio
capture only avaiable on screen?
Elad: no, it's available on tab, and screen on windows
Bernard: re high-FPS capture - is that typically tab?
Elad: in Chromium, yes
… but it's in general, a way for developers to steer toward
what they know will work for their use cases
Bernard: is "screen"-level capturing key to any of these
requests?
Elad: right; but note that "screen" could be used to capture
from a different monitor
Jan-Ivar: but all monitors are dangerous
Elad: so I'm hearing support except for the the screen-hint
TimP: I dislike heuristics-based picker - it makes it a
nightmare to test and makes everything unpredictable
Elad: the mention for heuristics was for apps to use, not the
UA
Jan-Ivar: supporting, but with stronger language on warnings
for self-capture
Echo Cancellation
[18]Echo cancellation: Need to specify the source of the echo
cancellation reference signal #31
[18] https://github.com/w3c/mediacapture-extensions/issues/31
[19]Specify constraint echoCancellationReferenceSinkId #32
[19] https://github.com/w3c/mediacapture-extensions/pull/32
Harald: this is a request coming from our audio team
… echo cancellation is about removing the audio picked up by
the microphone in the room to keep only the audio generated
*in* the room
… it's in general complicated - a complicated part is knowing
what to remove
… current implementation in Chrome just looks at what's coming
it via the peerconnection
… this has proven insufficient and we want to revise this
… if we want to remove audio output, you can hit issues with
specific headphones or setups
… from the application perspective, you want to identify what
output has been used that is most relevant to echo cancellation
and feed that to the algorithm
… to keep it simple, we have an enumaration of output devices
via sinkIds
… the proposal is to re-use this sinkid in the contraint for
echo cancellation
TimP: +1 to do something in this space
… will it help if you mix WebAudio in?
… i.e. when the audio output comes from WebAudio processing
Harald: yes, it should cover this (as long as the output makes
it to the speaker)
Jan-Ivar: Mozilla doesn't believe this API is needed to do
correct echo cancellation
… why does the UA needs JS input on this? The UA already know
which headset is being used
… it's not clear what getting input from the app is useful here
Harald: which audio output is currently used by the echo
cancellation?
Jan-Ivar: I believe we have access to the rendered output (incl
out of WebAudio)
… Paul Adenot is our key person on this
Harald: would like his opinion on the headcase
Youenn: +1 to Jan-ivar - the UA should already have access to
the all info it needs
… and it has more info that apps would have on this
bernard: Harald, you said chrome currently uses sum of all
audio outputs from peerconnection
… is the intent here to improve the chromium implementation or
to let them do better echo cancellation?
harald: this is not for app-based echo cancellation
bernard: I've heard requests from apps to do have an adjustable
echo cancellation - e.g. an echo cancellation transform stream
Harald: that is orthogonal to this proposal
… echo cancellation can't be modeled as a transform stream:
it's a 2 input objects
… it can be modeled as process that takes 2 audio inputs
youenn: you could still do 1 input / 1 output with an
additional parameter
… in the transform stream creation with the reference stream
Harald: interesting thing to do, but not this proposal
TimP: there are situations where you don't want to cancel part
of the stream being output - e.g. background music
… with the room accoustics
… maybe a rare use case, but one we've stumbled upon it for
immersiveness
harald: you could turn echo cancellation off?
timP: but that generates other issues
Sergio: I don't think this proposal would help solve the Chrome
issue
… there are 3 different issues being discussed: echo
cancellation in Chrome, new echo cancellation tuning use cases
(that would need clarification/refinement), and exposing echo
cancellation separately from WebRTC (maybe in Web Audio)
Harald: I'm hearing opposition to making an API of the specific
proposal because the UA should be able to figure it out
… I find it interesting that only browser output should be
cancelled - if you have another app than the browser producing
audio, shouldn't it be removed too?
Jan-Ivar: RNNoise has been exploring some of this; but
echoCancellation: true is likely focused on the meeting use
case
Youenn: the OS can also provide user-configurable echo
cancellation styles
Guido: the motivation for Chrome is to help figure which of the
output devices should be used as the reference signal for echo
cancellation
… if there are several audio output devices with one being
preferred by the app
Harald: I'd like to invite comments on the issue on whether
this API is needed or not
… I haven't seen much comments on the shape of the API
… if we were to conclude there was such a need, this API may be
OK
… but no consensus on the need for such an API
Wrapping up
Bernard: any CfC needed based on our discussions?
Jan-Ivar: re getViewportMedia, should we put this in a new doc
or an existing one?
Dom: having a single document couple their process progress
elad: also keeping them separate helps making clear how
distinct they are
youenn: it also helps in terms of separating the test cases in
different folders
harald: sounds like convergence towards a separate spec
jan-ivar: would still prefer a single doc
October meeting
Bernard: next meeting will be devoted to mediacapture-transform
- proposed content and agenda was shared on the list
[20]Preview of October Virtual Interim slide deck
[20]
https://lists.w3.org/Archives/Public/public-webrtc/2021Sep/0030.html
Bernard: there is overlap between mediacapture-transform and
WHATWG streams issues
Youenn: I will try to mark more explicitly issues in MC-T that
are linked to WHATWG streams
Bernard: part of what I thought might be useful to hear is
where these upstream WHATWG stream issues are on the roadmap
(if at all)
Jan-Ivar: the new proposal we want to present is streams-based,
but improvements over the existing one
… still needs some fixes in WHATWG streams
… I have linked demos in the slides for some of the issues
we're trying to address
TimP: it would be good to start these presentations with use
cases to scope our discussions
Jan-Ivar: the slides Youenn and I developed includes goals of
the proposals
Harald: Media Capture Transform starts with use cases
Bernard: Streams have been adopted to use streams to manage
pipelines
Youenn: please send early feedback on the proposals
Summary of resolutions
1. [21]getViewportMedia capture the full viewport when called
from an iframe
2. [22]use viewport-capture as naming basis for Document
Policy of getViewportMedia
Minutes manually created (not a transcript), formatted by
[23]scribe.perl version 136 (Thu May 27 13:50:24 2021 UTC).
[23] https://w3c.github.io/scribe2/scribedoc.html
Received on Monday, 20 September 2021 17:06:49 UTC