[minutes] March 30 teleconf from Dominique Hazael-Massieux on 2020-03-30 (public-webrtc@w3.org from March 2020)

From: Dominique Hazael-Massieux <dom@w3.org>
Date: Mon, 30 Mar 2020 18:50:55 +0200
To: "public-webrtc@w3.org" <public-webrtc@w3.org>
Message-ID: <a70e0000-7ef1-8b12-cb64-2fee9863cd70@w3.org>
Hi,

The minutes of our call earlier today are available at:
  https://www.w3.org/2020/03/30-webrtc-minutes.html

and copied as text below.

Dom


                         WebRTC Virtual Interim

30 March 2020

   [2]IRC log.

      [2] https://www.w3.org/2020/03/30-webrtc-irc

Attendees

   Present
          Bernard, DomHM, Florent, Harald, Henrik, Jan-Ivar,
          Jianjun, SamDallstream, TimPanton, Youenn

   Regrets
          -

   Chair
          Bernard, Harald, Jan-Ivar

   Scribe
          dom

Contents

    1. [3]WebRTC
         1. [4]WebRTC Features at risk
    2. [5]ISSUE-2495 When is negotiation complete?
    3. [6]ISSUE 2502 When are effects of in-parallel stuff
       surfaced?
    4. [7]Media Capture and Streams
         1. [8]Issue 671 new audio acquisition
         2. [9]ISSUE 639 Enforcing user gesture for gUM
    5. [10]Next meeting
    6. [11]Summary of resolutions

Meeting minutes

  WebRTC

    WebRTC Features at risk

   Bernard: a few unimplemented features not yet marked at risk
   … 3 issues filed related to that
   … first one is Issue 2496 - the voiceActivityFlag exposed in
   SSRC, not implemented anywhere
   … any disagreement to marking it at risk?

   Henrik: SGTM

   Bernard: we have one unimplemented MTI per issue 2497,
   partialFramesLost
   … should we remove it from the MTI list?

   Jan-Ivar: no objection to unmark that one; will we get
   implementations for the other ones?

   Henrik: they need to be moved from one dictionary to the other
   - they've been implemented, they just need to be moved into a
   different object

   JIB: it's not clear to us yet how easy it will be to implement
   in Firefox; pointers to upstream webrtc.org hooks would help

   Resolution: remove MTI marker on partialFramesLost

   Bernard: last one is multiple DTLS certificates, not
   implemented anywhere

   HTA: the goal was to help support signed certificates, which is
   completely unspecified

   Dom: so if we remove support for it, the idea would be to say
   the spec only uses the first certificate in the list

   TimP: wasn't the background of this support for multiple kind
   of certificates?

   Bernard: with full support for DTLS 1.2, that's no longer
   relevant

   Bernard: I'm hearing consensus on all of these

  ISSUE-2495 When is negotiation complete?

   JIB: this emerged while writing tests for WPT, but is
   applicable beyond testing
   … "Perfect negotiation" is the pattern we recommend in the spec
   that helps abstract away the negotiation from the rest of the
   application logic
   … having a negotiationended event would help avoid glare,
   simplify the logic
   … the obvious approach to detect the end of negotiation is racy
   … there are workaround, action-specific spin-tests (while
   loops)
   … but that's bad, leading to timeouts
   … I've also tried another workaround by dispatching my own
   negotiated event at the exact right time
   … this is slightly better, but we can still miss cases
   … can we do better? I have 3 proposals
   … fire a negotiationcomplete from SRD(answer) if renegotiation
   isn't needed
   … one downside is that subsequent actions may delay the event
   if further negotiations is needed in some edge cases
   … Proposal B is a boolean attribute for negotiationneeded -
   needs careful implementation in relation to the
   negotiationneeded event
   … it's also delayed by subsequent actions
   … Proposal C: an attribute exposing a promise for
   negotiationcomplete
   … it's better because it's not delayed by subsequent actions
   (by replacing promises as new negotiations get started)

   Henrik: compared to proposal A?

   JIB: imagine you call addTransceiver-1 & addTransceiver-2, you
   have to wait until addTransceiver-2 before the event fires
   (which you don't in proposal C)

   Henrik: you can build your own logic if you care about partial
   negotiations - what you want to know in general is "am I done
   or not"?

   HTA: I question the question - why should I care if negotiation
   is complete?
   … what you have here is indeed a problem, but what the app
   cares about is whether the transceiver is connected to a live
   channel or not
   … you don't have this problem with datachannels since you have
   an onopen event
   … if we want to solve this at all (I would prefer not adding
   any API at this point), I think we should look at a signal on
   the transceiver availability

   Bernard: don't you get that from our existing states, e.g. via
   the transports?

   Harald: we have it with currentDirection, but without an event,
   it has to be polled

   JIB: I think apps do need to know whether the transceiver is
   ready or not, and having that done with a timeout is not great

   HTA: what I'm saying is what matters is the readiness of the
   transceiver, not the state of the negotiation
   … if we want to add anything here, it should be a
   directionchange event to the transceiver

   TimP: it could be done with proposal C which indicates "what"
   is complete (i.e. which transceiver is ready)
   … otherwise, I agree you want to know what it is you got

   JIB: you would get that via JS closure

   Henrik: I think this is a "nice-to-have" - useful for testing &
   debugging; but I think it's a problem that can be solved with
   the existing API

   JIB: I don't think this can be polyfilled, given that
   negotiationneeded is now queued
   … negotiationneeded can be queued behind other operations in
   the PC

   Henrik: you can detect this for each of your negotiated states
   by observing which changes are actually reflected (with
   different logic for each type of negotiation)
   … this would be nicer, but I don't think it's needed

   JIB: you mentioned setStreams - it cannot be observed locally
   … another advantage of the promise is that it lets you
   determine if you're still on the same "negotiation train" by
   comparing promises

   Youenn: it would be interesting to see if libraries built on
   top of PC are implementing that pattern
   … this might be a good way to determine its appeal

   Henrik: it would be great for debugging for sure, esp in the
   age of perfect negotiation

   Youenn: so let's wait to see what apps adopting perfect
   negotiation before committing to this

   Conclusion: keep for post 1.0 (in webrtc-extension?)

  ISSUE 2502 When are effects of in-parallel stuff surfaced?

   Henrik: the singaling/Transceiver states defined in JSEP and
   the API can't be the same to the cost of racy behavior
   … which means the requirements imposed by JSEP on these states
   create ill-defined / inconsistent behaviors
   … Proposals to address this: Proposal A: we make addTrack
   dependent only on WebRTC states, not JSEP states
   … this is probably what the spec says, not what implementations
   do
   … Proposal B: we make addTrack depend on a "JSEP transceiver",
   but would be racy and create implementation specific behaviors

   JIB: I agree there is a race in JSEP
   … JSEP was written without thinking about threads at all
   … the problem is not really about whether we're in a JS thread
   or not
   … we have to make copies of things

   Henrik: my mental model is that WebRTC JS shallow objects refer
   to JSEP objects
   … the only problem is with addTrack because of recycling of
   transceivers

   JIB: the hygienic thing would be to copy state off from JSEP
   when looking at transceivers. Is that proposal A?

   Henrik: it's implicit in proposal A

   JIB: the only problem with that with your example on slide 17 -
   this would leave a hole e.g. in the context of perfect
   negotiation

   Henrik: I think that's a better alternative than starting to
   meddle with internal JSEP objects
   … the hole here is that if you're unlucky, you need another
   round of negotiation
   … and in that situation, you would be in a racy scenario in the
   first place

   HTA: the code of slide 17 is not compatible with perfect
   negotation

   Henrik: I think proposal A is the only sane approach

   HTA: this sounds ready for a pull request

   JIB: I think the spec is currently racy given "JSEP in
   parallel" so it's more than an informative change

   Resolution: getTransceivers() SHALL NOT be racy

  Media Capture and Streams

    Issue 671 new audio acquisition

   Sam: Sam Dallstream, engineer at Microsoft
   … this is a feature request / issue on the spec
   … at the spec stands today, it is hard to differentiate streams
   meant for speech recognition vs communication
   … the current implementations are geared towards communication,
   which sometimes is at odd with the needs for speech recognition
   … e.g. in comms, adding noise can be good, but it hurts with
   speech recognition
   … slide 22 and 23 shows the differences of needs between the
   two usages, extracted from ETSI specs
   … the first proposal to address this would be a new constraint
   (e.g. "category") that allows to specify "default", "raw",
   "communication" "speechRecognition"
   … it translates well to existing platforms: windows, iOS,
   Android have similar categories
   … the problem is that it competes with content-hint in a
   confusing way - content-hint is centered around default
   constraints AND provide hints to consumer of streams
   … whereas this one is setting optimization on the stream itself
   (e.g. levels of echo canceling)
   … A second proposal is to modify the constraints to make them a
   bit more specific, and add a new hint to content-hint
   … the advantage is that it fits the current content-hint draft,
   with more developer freedom
   … but it may be hard to implement though
   … Would like to hear if there is consensus on the need, and get
   a sense of the direction to fulfill it

   Henrik: for clarification, for echoCancellation, it's not
   turning it off, it's tweaking it for speech recognition

   Sam: right - right now echoCancellation it's a boolean (on or
   off)

   HTA: but then how does it fit well well with the existing
   model?

   Sam: I meant it's easier for API consumers, but you're right it
   conflicts with other constraints

   Bernard: this is not about device selection here

   JIB: indeed, most of this is done in software land in any case

   Henrik: right, here it's more about disable/enabling feature

   JIB: what's the use case that can't be done by gUM-ing & turn
   off echoCancellation, gainAutoControl, ambientNoise?

   Bernard: it's not on & off

   TimP: e.g. in speech interactions, you don't want the voice AI
   to hear itself

   Sam: Alexa right now turns off everything and then adds their
   own optimization for speech recognition
   … so this can already be done, but the idea is to allow
   built-in optimizations so that not everyone has to do their own
   thing

   Youenn: do systems provide multiple echo canceller?
   … I don't think you can do that in iOS

   Sam: that's why the second proposal isn't as straightforward

   Henrik: the advantage of these categories is that they vague
   enough that implementations can adjust depending on what the
   underlying platforms provide
   … but then it's not clear exactly what the hint does

   HTA: I would expect a lot of variability across platforms in
   any case

   Henrik: as is the case for echoCancellation: true

   HTA: indeed (as the multiple changes of the impl in Chrome
   show)

   Henrik: it sounds like it is hard-enough to describe,
   implementation-specific enough that it should be a hint

   JIB: I think that's fair to say that the audio constraints have
   been targeted a the communications use case
   … not sure how much commitment there is for the purpose of
   speech recognition

   Sam: right

   Henrik: with interop in mind, echoCancellation: true worked
   because everyone did their best job at solving it, not doing it
   the same thing
   … to get that done with this new category, we would need the
   same level of commitment and interest from browser vendors
   … the alternative is turning everything off and doing post
   process in WebAudio/WASM

   TimP: another category beyond comm, speech-rec here is
   broadcast
   … it shouldn't be a two-states switch

   JIB: anything here that couldn't be solved with WebAudio /
   AudioWorklets

   Sam: I would need to take another look at that one

   HTA: you would still need a "raw" mode

   Youenn: maybe also look at existing open source implementation
   of ambient noise and whether they share some rough parameters

   Sam: it sounds like leaning towards 2nd proposal

   Dom: maybe first also determine what can be done in user land
   already with Web Audio / Web Assembly
   … if this is already doable there, then maybe we should gain
   experience with libraries first

   HTA: given we already have a collection of hints in
   content-hint that have been found useful, it's kind of easy to
   add it there

   Bernard: would this applies up to gUM?

   HTA: yes, that's already how it works

   JIB: if we're thinking adding a new hint, we may need new
   constraints specific to speech-recognition

   [discussion around feature detection for content-hints]

    ISSUE 639 Enforcing user gesture for gUM

   Youenn: powerful APIs are nowadays bound to user gesture
   … if we were designing gUM today, it would be as well
   … but that's not Web compatible to change now
   … can we create the conditions to push Web apps to migrate to
   that model
   … PR 666 proposes to require user gesture to grant access
   without a prompt
   … I've looked at a few Web sites; whereby.com works with the
   restrictions on
   … it wouldn't work in Hangout or Meet
   … Interested in feedback on the approach and availability to
   help with webrtc app developers outreach

   Youenn: the end goal would be that calling gUM without user
   gesture should be rejected
   … user gesture is currently an implementation-dependent
   heuristic - this is being worked on

   Henrik: I think we would need it to be better defined
   … it is also linked to 'user-chooses'

   Youenn: the situation is very similar to getDisplayMedia where
   Safari applies the user gesture restriction
   … it could be the same with gUM

   JIB: I like the direction of this; we could describe it as
   privacy & security issue
   … with feature-policy, there is a privacy escalation pb through
   navigation
   … jsfiddle allowed all feature policies, so from my site I
   could have navigated to my jsfiddle, got priviledged there
   before navigating back with an iframe
   … so that sounds like an important fix
   … the prompting fallback sounds interesting
   … denying on page load might be harder to reach
   … it's not clear that same-origin navigation should be blocked

   Youenn: user gesture definition is still a heuristic, these
   could fit into that implementation freedom

   HTA: how much legitimate usage would we break?
   … before progressing this, we should have a deployed browser
   with a counter to detect with/without user gesture

   Youenn: Webex and Hangout call it on pageload, so that would
   make the counter very high

   HTA: so will someone get data?

   Youenn: I don't think Safari can do this; would be happy if
   someone can do this
   … I can reach to top Web site developers

   HTA: would anyone at Mozilla interested in collecting this
   data?

   JIB: based on our user gesture algorithm? I'll look, but can't
   quite commit resources to this at the moment

   Conclusion: more info needed

  Next meeting

   HTA: probably in April / May

   <dom> s/Topic: Issue-2495/SubTopic: Issue-2495

   <dom> s/Topic: Issue 2502/SubTopic: Issue 2502

Summary of resolutions

    1. [12]remove MTI marker on partialFramesLost
    2. [13]getTransceivers() SHALL NOT be racy
Received on Monday, 30 March 2020 16:51:01 UTC