- From: Dominique Hazael-Massieux <dom@w3.org>
- Date: Fri, 26 Nov 2021 15:37:50 +0100
- To: "public-webrtc@w3.org" <public-webrtc@w3.org>
Hi,
The minutes of our meeting on Nov 24 are available at:
https://www.w3.org/2021/11/24-webrtc-minutes.html
and copied as text below.
Thank you Florent for scribing!
Dom
WebRTC November 2021 Virtual interim
24 November 2021
[2]Agenda. [3]IRC log.
[2]
https://www.w3.org/2011/04/webrtc/wiki/November_24_2021#WebRTC_WG_Virtual_Interim
[3] https://www.w3.org/2021/11/24-webrtc-irc
Attendees
Present
Anssi, BenW, Bernard, Carine, Dom, Eero, Elad, Florent,
Guido, Harald, Jan-Ivar, PatrickRockhill, Riju,
TimPanton, TonyHerre, Tuukka, Youenn
Regrets
-
Chair
Bernard, Harald, Jan-Ivar
Scribe
Dom, Florent
Contents
1. [4]Media Capture Transform
2. [5]Region Capture
3. [6]WebRTC NV Use Cases
4. [7]Face Detection API
Meeting minutes
Slideset: [8]https://lists.w3.org/Archives/Public/www-archive/
2021Nov/att-0005/WEBRTCWG-2021-11-24.pdf
[8]
https://lists.w3.org/Archives/Public/www-archive/2021Nov/att-0005/WEBRTCWG-2021-11-24.pdf
Media Capture Transform
[Jan-Ivar gives updates on discussion with WHATWG re Streams]
Harald: we have issues that need solving, and we're fairly
confident we'll be able to solve them.
[Harald presents how to evolve Jib's and his proposal]
Youenn: Is audio part of the presentation?
Harald: Audio is out, we have no consensus
Youenn: want stronger guarantees that we'll be able to solve
the issues
Dom: We need to be convinced we have a solution for those
problems.
TimPanton: What happens if you don't close it (a stream)?
Harald: They will hang around until garbage collection.
TimPanton: We need to be clear about the expectations to avoid
surprise browser slowdowns.
Jib: We could add notes in the document about life cycle
issues.
Youenn: As long as we do not have a good API, we cannot make
progress. I feel more positive now.
Jib: I feel confident, we have 2 solutions for this problem.
Bernard: Looking at samples, we have not done all the required
cleanup, especially related to error conditions.
Bernard: AI: First step to make the changes for the draft,
announce it on the list and make a call for adoption when it’s
ready.
Region Capture
Elad presents the rationale for the Region Capture API and
shows code examples.
Youenn: You can use message channels to transfer, it shouldn't
be a problem. I think you can use decorations instead of crop
ids
Elad: we want it to work with GDM, GVM and extensions. It
should work with multiple tabs from different domains. It seems
this is a faster way to ship.
Dom: What happens if the targeted element is no longer visible?
Elad: There's a spec draft with more details on my Github. It
suggests that the track is muted when the element is no longer
visible and unmuted when it’s again visible.
Youenn: Adding features like cropping to GDM tab capture can be
risky, we want to move away from it. If we add to GDM people
might stick to it while GVM is safer. The more we provide
benefits in GVM and the less there is in GDM, the safer the web
is.
Jib: We should avoid the word "id" or exposing an ID in an API
as it might invite a lot of scrutiny. Suggests adding an
interface instead. It helps with garbage collection.
Jib: Using applyConstraints() instead, which could have with
the timing and presenting an uncropped frame might be better.
Elad: Constraints might be more difficult implementation wise
and have weaker guarantees on when they apply.
Jib: Constraints may be underspecified and we could improve
applyConstraints() as well.
TimPanton: I think we should do this quickly. I like the idea
of an opaque token better than transferring a stream. Not fond
of constraints and in favor of the API.
Elad: What is a problem with string IDs?
TimPanton: It's about avoiding a conversation regarding
potential risks. Advises making an interface and adding strings
later.
Elad: Open to that idea.
Dom: UUIDs can indicate returning users. It's better to avoid
the discussion with the Privacy Interest Group.
Youenn: Supports token but not constraints.
Harald: Interface can be serialisable, and if so, they can be
strings in disguise. UUIDs are fine if they are short lived,
but there is a privacy risk.
[Discussion about if we want a call for adoption for the
document or if a call for review is possible.]
Jib: Suggests we can have the CFA after the document is updated
to use an opaque token.
Dom: Will Elad offer to be an editor?
Elad: Yes
Dom: AI: Need to confirm this with the chairs. Updating the
document with the token and then CFA.
WebRTC NV Use Cases
[ [9]Slide 30 ]
[9]
https://lists.w3.org/Archives/Public/www-archive/2021Nov/att-0005/WEBRTCWG-2021-11-24.pdf#page=30
Bernard: lots of systemic changes since our FPWD of NV use
cases, in particular due to the pandemic and technological
advances
[ [10]Slide 31 ]
[10]
https://lists.w3.org/Archives/Public/www-archive/2021Nov/att-0005/WEBRTCWG-2021-11-24.pdf#page=31
Bernard: NV use cases include 3 use cases from the original use
cases that could be improved
… not much API support for these - essentially webrtc-ice and
webrtc-svc
… lots of references to ICE
… the first 2 related use cases aren't necessarily the most
requested
… not clear that ICE is such a key enabler either
… the video conferencing use case might be approached
differently, with webcodecs rather than by extending WebRTC
[ [11]Slide 32 ]
[11]
https://lists.w3.org/Archives/Public/www-archive/2021Nov/att-0005/WEBRTCWG-2021-11-24.pdf#page=32
Bernard: in terms of new use cases
… only API that applies is rtcdatachannel in workers
… no progress on other, despite the use cases themselves having
found broad adoption
[ [12]Slide 33 ]
[12]
https://lists.w3.org/Archives/Public/www-archive/2021Nov/att-0005/WEBRTCWG-2021-11-24.pdf#page=33
Bernard: for use cases 3.6-8, quite a bit of activity in the WG
- mediacapture transform, machine learning
… not so much for 3.9
… still discussions around intersection with ML
… no specifics on face/body tracking API
… nothing on data exchange in service workers
[ [13]Slide 34 ]
[13]
https://lists.w3.org/Archives/Public/www-archive/2021Nov/att-0005/WEBRTCWG-2021-11-24.pdf#page=34
Bernard: does the doc reflect current industry priorities?
current state of tech (incl webcodecs)?
… none of the use cases have all their requirements met by API
proposals
… only 4 have at least one proposal
… substantial gaps in requirements around data transport
… The document doesn't talk about the long term architecture
… current doc seems to build on the view of "extending" webrtc,
but this may need to evolve based e.g. on the webcodecs view of
the world
TimP: in terms of environment change, a lot more is happening
on mobile than used to be and what we would have expected
Bernard: true; incl for game streaming
TimP: also for e.g. small social / family gathering
… also a question worth addressing is the P2P architecture
… in reality, most of the WebRTC usage isn't P2P
… WebTransport is not designed for P2P
… maybe WebRTC is P2P and WT is the centralized architecture?
Youenn: some of the new technologies like WebTransport are
providing more flexibility
… where WebRTC is more of an integrated system
… the need for metadata synchronization (e.g. in metaverse)
seems very relevant, needs more detailed anchoring in our APIs
… RTP headers extension might be exposed for non-browser
handling
… also +1 to TimP's point about mobile browsers - getting more
interop across iOS / android would be good
… e.g. handling of muting in case of priority audio capture in
mobile (e.g. in mobile phone)
… in general, providing more consistency across os/browsers
would be good
harald: there is a lot of stuff that uses webrtc outside the
browser - e.g. recently in a ring doorbell
… being able to contact these (pseudo-)webrtc endpoints from
browsers is important
… our use case driven approach hasn't worked to well to track
what's going on
… in terms of long term architecture, it's hard to manage -
trade-off between consistency and fitness
anssi: ML WG chair here - we're very interested in making sure
our WebNN API helps with the webrtc use cases
<anssik> [14]https://github.com/webmachinelearning/webnn/
issues/226
[14] https://github.com/webmachinelearning/webnn/issues/226
[15]Integration with real-time video processing #226
[15] https://github.com/webmachinelearning/webnn/issues/226
anssi: we've started developing a prototype based on background
blurring
Jan-Ivar: apart from funny hats, most of the use cases focus on
PeerConnection
… we've seen lots of use cases around media capture (e.g.
screen sharing)
… Mozilla takes use cases pretty seriously - some use cases are
marked as not having consensus, would prefer we call for
consensus or remove them
Face Detection API
[ [16]Slide 36 ]
[16]
https://lists.w3.org/Archives/Public/www-archive/2021Nov/att-0005/WEBRTCWG-2021-11-24.pdf#page=36
Riju: I hope to present proposals that help address 4 of the
use cases that were presented
… today focusing on face detection, following the related
breakout at TPAC last month
[17]WebRTC Intelligent Collaboration TPAC 2021 breakout
[17] https://www.w3.org/2021/10/20-webrtc-ic-minutes.html
Riju: we have an updated proposal for what an API might look
like
[18]Face detection proposal
[18]
https://eehakkin.github.io/intel-w3c-mediacapture-extensions/#face-detection
[ [19]Slide 37 ]
[19]
https://lists.w3.org/Archives/Public/www-archive/2021Nov/att-0005/WEBRTCWG-2021-11-24.pdf#page=37
Riju: developers could request a specific number of points for
the contour
… a face mesh is unlikely to be available in the short term,
but documented it in the API for sake of completeness
… the proposal includes a set of expressions that can be
obtained from drivers without DNN
… again, we can decide to remove items
Harald: contours is an improvement to square or rectangles -
particularly needed for e.g. background blur
… I worry about relying on what's available from drivers
… instead, I would like us to approach this based on what's
available in the frames of an MST, no matter how it was added
… likewise, I owuld like to be able to add that data to frames
when I'm a producer
… the API should allow consumption, production and even
refinement of annotations attached to a track
… e.g. a transform could improve the rough contour identified
by the driver to bring an improved annotation downstream
… the shape of the API has the right amount of metadata, but it
shouldn't be described as a one-way consumption API
Riju: I'll try to show a proposal for background blur early
January
… what we're trying to provide here is processing
free-of-computation, because it already happens in the driver
TimP: +1 to Harald in terms of enabling successive refinements,
on top of the compute-free results
… I'm also nervous about the expression enums
… others are reasonable factual, while expressions is more
subjective
riju: is the concern about restricting the list? blink and
smile are available across platforms
TimP: my concerns is that the detection could be wrong, in
particular for some subgroups
… given the level of subjectiveness
Riju: note taken
Jan-Ivar: what was the previous agreement on this topic? what
is the question you're bringing to the WG?
Riju: last time I presented a proposal on top of image capture,
it was suggested to bring to mediacapture-extensions
… also there was a request of making it more generic
… the goal would be to bring it to the mediacapture extension
specs
jan-ivar: there would need to be a process on whether to adopt
or not this API
… mediacapture-extensions sounds like a good place for a future
proposal
… I'm not entirely sure how to deal with this for now
Bernard: this is interesting; I have concerns with emotion
analysis in terms of accuracy
… in terms of how this would be used - it's a method on
MediaStreamTrack, but it would have to be designed to work with
Media Capture Transform
… I woudl see it as TransformStream to be used e.g. for
background blur
… the information would be used to execute the blur faster
… the provided information (e.g. the contour) is meant to help
processing something the GPU buffer
… when the information itself might be CPU-side
riju: the performance on chromeos/window based on CPU doesn't
depend on GPU memory
… I think it's giving good results
bernard: in terms of API shape, having this on mediastreamtrack
feels wrong - I want it to work on a videoframe
Youenn: similar feedback to Bernard - exposing driver data is a
decision at the mediastreamtrack level, but the data should not
be on MST - it should be synchronized with videoframes
… either by getting it from the frame, or getting it at the
same time as a video frame
… that's the kind of model that would make sense to me
… Given that the idea is to expose driver info, it's good to be
as specific as possible
… we should separate driver-specific metatadata from more
general approaches
Bernard: next steps?
Riju: could show some demos with performance numbers
dom: heard consistent feedback to anchor this in VideoFrame
… defined in WebCodecs
Youenn: let's have the discussion in mediacapture-extensions
and identify an architecture there
[20]Face Detection. #289
[20] https://github.com/w3c/mediacapture-image/issues/289
[now at [21]https://github.com/w3c/mediacapture-extensions/
issues/44 ]
[21] https://github.com/w3c/mediacapture-extensions/issues/44
Received on Friday, 26 November 2021 14:37:54 UTC