Minutes of Nov 15 interim

Le 16/11/2022 à 12:37, Harald Alvestrand a écrit :
> Is now available: https://youtu.be/YHLpqvcRAlY

And the minutes are available at 
https://www.w3.org/2022/11/15-webrtc-minutes.html (and copied as text below)


                       WebRTC November 2022 meeting

15 November 2022

    [2]Agenda. [3]IRC log.

       [2] https://www.w3.org/2011/04/webrtc/wiki/November_15_2022
       [3] https://www.w3.org/2022/11/15-webrtc-irc


           Bernard, BrianBaldino, Carine, Cullen, Dom, Eero, Elad,
           Florent, Harald, Hugo, Jan-Ivar, PatrickRockhill,
           PeterThatcher, PhilippHancke, Tove, Tuukka, Youenn


           Bernard, HTA, Jan-Ivar



     1. [4]Encoded Transform
     2. [5]WebRTC PC
          1. [6]Issue #2795: Missing URL in RTCIceCandidateInit
          2. [7]Issue #2796: A simulcast transceiver saved from
             rollback by addTrack doesn’t re-associate, but unicast
          3. [8]Issue #2724: The language around setting a
             description appears to prohibit renegotiation of RIDs
     3. [9]Timing Model & WebCodecs
     4. [10]Face Detection
     5. [11]MessagePort on Capture Handle
     6. [12]enumerateDevices & Focus

Meeting minutes

    Recording: [13]https://youtu.be/YHLpqvcRAlY

      [13] https://youtu.be/YHLpqvcRAlY



    Slideset: [15]https://lists.w3.org/Archives/Public/www-archive/


   [16]Encoded Transform [17]🎞︎

      [16] https://github.com/w3c/webrtc-encoded-transform
      [17] https://www.youtube.com/watch?v=YHLpqvcRAlY#t=150

    [18][Slide 10]


    Harald: I offered to use the IETF Hackathon to experiment with
    encoded transform (on my own, for lack of participants)

    [19][Slide 11]


    [20][Slide 12]


    Harald: developed 2 demos to evaluate the API (but not for

    [21][Slide 13]


    Harald: I had initially thought I needed both producers and
    consumers, but writing the demos, only the producers seemed

    [22][Slide 14]


    Harald: the processing is done via a user-defined JS class that
    you insert in the processing pipeline, but without requiring a
    single PC used in both end of the pipe
    … this led to the conclusion that the API could be used
    … Peter worked separately on how that one-way API approach
    could be done with the existing two-ways APIs

    [23][Slide 15]


    Peter: I got it working with transport, codec

    [24][Slide 16]


    Peter: a constructor would help
    … also missing signals for congestion control

    [25][Slide 17]


    Peter: pretty straightforward on the receiver side

    [26][Slide 18]


    Peter: again, missing a way to control e.g. the encoder bitrate
    based on congestion control

    [27][Slide 19]


    Peter: for a Decoder, we would again want a constructor for the
    encoded video frame, and signals to detect the need for a key

    [28][Slide 20]


    Peter: Harald's approach would satisfy these needs

    Youenn: with regard to these 5 gaps, there is already a
    solution for the keyframe problem
    … for constructors, I'm not sure why we need something on top
    of what WebCodecs provide from raw data; what's the point of
    using PC for incoming data?

    Peter: WebCodecs doesn't have a built-in jitter buffer, whereas
    this would

    Youenn: but we've been discussing letting the app define the
    jitter buffer
    … so it's not clear that there is a benefit

    Peter: it would still allow to get the same behavior that you
    get from WebRTC without having to write your own jitter buffer

    Youenn: I think this would benefit from clearer use cases

    Harald: one of the use cases that needs this is getting an
    incoming video frame and passes it out to a different peer
    … or passing it to 2 peer connections

    Youenn: to re-forward it?

    harald: possibly, yes

    Youenn: this may be mostly about serialization, rather than a

    harald: metadata may need rewriting
    … let's see about use cases

    Jan-Ivar: what's the high level problem we're solving? would
    this be instead of encoded transform? re-imagining it?
    identifying issues with it?
    … we have readable and writable streams on mediastreamtracks
    … so I can already receive a track and forward it
    … what's the difference?

    Harald: this relates to the use cases discussed at TPAC
    … there were compelling arguments that this could not be
    addressed without substantive changes of the webrtc encoded
    transform API
    … not clear if this should replace or extend it - depending on
    where the shape lands

    Peter: you cannot forward well without bandwidth estimation
    … you could re-use the encoded(audio|video)frame to forward
    them as is, but you probably need to re-packetize which you
    can't do without a constructor

    Jan-ivar: OK; still unclear how this would affect the API shape

    Peter: I was focused on identifying the gaps at this stage

    Harald: I explicitly shied away from presenting an API shape,
    to focus on use cases and requirements at this stage
    … this is to stimulate the discussions

    Peter: my impression is that this could be added with fairly
    minimal changes (constructors, signals)
    … not a big delta from what we have

    Harald: so next step is to enumerate use cases a bit more
    before making a change proposal
    … Peter and I will continue to iterate on this

   [29]WebRTC PC [30]🎞︎

      [29] https://github.com/w3c/webrtc-pc
      [30] https://www.youtube.com/watch?v=YHLpqvcRAlY#t=1434

     Issue [31]#2795: Missing URL in RTCIceCandidateInit [32]🎞︎

      [31] https://github.com/w3c/webrtc-pc/issues/2795
      [32] https://www.youtube.com/watch?v=YHLpqvcRAlY#t=1450

    [33][Slide 24]


    Youenn: this follows from discussion at the previous meeting
    … the server URL used to be exposed in the event, and it has
    been proposed to move it to the candidate object itself
    … but we didn't discuss whether it would survive JSON
    serialization / deserialization
    … so far serialization/deserialization has been without
    information less
    … should that apply to the URL attribute?

    [34][Slide 25]


    Youenn: this impacts whether it gets submitted to remote
    parties by default (although this is only about defaults, not
    about protecting the info in general since it remains available
    to JS)
    … in general, do we want to keep the invariant of non-lossiness
    on this object?
    … Personally, I don't think there are good use cases to pass
    the url to remote parties, and we should keep the model
    consistent with regard to lossiness
    … so we should keep the url attribute to the event rather than
    the object
    … it can be shimmed easily from one to the other

    [35][Slide 26]


    fippo: toJSON conveys information that is needed for ICE
    … additional properties were added to avoid having developers
    parsing data out of the canddiate string
    … e.g. to determine the network topology

    [36][Slide 27]


    youenn: the question is about convenience / POLA

    fippo: exposing the data on candidate is best to avoid having
    to go through stats
    … you can't correlate you event with stats except through IP
    address matching

    Jan-Ivar: I'm hearing that the candidates already has
    information that aren't exposed in toJSON

    Fippo: right, e.g. relayProtocol

    Jan-Ivar: so that already breaks the supposed invariant on

    Youenn: if so, that goes against the spirit of the spec
    … if that's not the case, this may require clarifying the spec
    or aligning it with the invariant

    fippo: the problem is that we're trying to treat local and
    remote candidates the same
    … but local candidates can have more info

    youenn: that's why I thought the event was a good way to expose
    local information

    Fippo: in stats we distinguish a lot between local & remote

    jan-ivar: my preference is to not send it to remote parties,
    and so not include it in toJSON
    … the design pattern for events is also not to expose
    properties on the event when it can be exposed on the
    underlying object
    … so I lean towards exposing it in the object

    youenn: we then at least to change the constructor

    harald: the candidate is behaving like a data object, without
    inherent behavior
    … people expect to copy data objects, and they would expect
    toJSON() to allow this - breaking such a pattern is a bad idea
    … we have a backwards compatibility problem since toJSON is
    used to send data to remote parties
    … I think it was a mistake to use toJSON for transmission
    … I think putting the url data on the candidate is right
    … I think the right direction would be to add a method that
    only exposes the right info for the remote party

    Youenn: another approach would be to distinguish local vs
    remote candidates

    Harald: that's an interesting idea

    Jan-Ivar: I agree we have a wart here, but I don't think we
    should chase technical purity
    … subclassing would not solve the backwards compat issue

    Youenn: let's iterate on this discussion on github

     Issue [37]#2796: A simulcast transceiver saved from rollback by
     addTrack doesn’t re-associate, but unicast does [38]🎞︎

      [37] https://github.com/w3c/webrtc-pc/issues/2796
      [38] https://www.youtube.com/watch?v=YHLpqvcRAlY#t=2455

    [39][Slide 28]


    Jan-Ivar: more corner cases, esp to with rollbacks

    Harald: proposal seems reasonable to me

    Bernard: +1

    Harald: Jan-Ivar will propose a PR

     Issue [40]#2724: The language around setting a description appears
     to prohibit renegotiation of RIDs [41]🎞︎

      [40] https://github.com/w3c/webrtc-pc/issues/2724
      [41] https://www.youtube.com/watch?v=YHLpqvcRAlY#t=2726

    [42][Slide 29]


    Jan-Ivar: see also [43]PR #2794

      [43] https://github.com/w3c/webrtc-pc/pull/2794

    [44][Slide 30]


    Jan-Ivar: this would match Chrome & Safari, although there is a
    remaining inconsistency identified in Chrome

    Harald: this is the one where you discovered Chrome disabled
    layers rather than removing them
    … this sounds reasonable given our previous agreement on this

    Jan-Ivar: it's a small change that doesn't introduce new
    behaviors, but extend them

    Harald: I think this works
    … Will you add tests too?

    Jan-Ivar: yes, along with FF implementation of setParameters

   [45]Timing Model & WebCodecs [46]🎞︎

      [45] https://github.com/w3c/webcodecs
      [46] https://www.youtube.com/watch?v=YHLpqvcRAlY#t=3016

    Bernard: the gorup created a videoframemetadata registry with a
    … an example of that is the request to register human face
    metadata [47]#607
    … this also relates to the requestVideoFrameCallback spec
    (being merged in HTML)
    … which also exposes metadata and whether they should be
    exposed there as well
    … the rVFC spec exposes timing info at all aspects of the
    pipeline (captureTime, rtpTimestamp, receiveTime,
    processingDuration, expectedDisplayTime, presentationTime)
    … it mixes codec-related timing but also rtp-related info
    … this brings up a number of questions: where is the metadata
    exposed in our APIs (e.g. mediacapture transform)
    … should I expect .captureTime to be visible in a videoframe
    … likewise, there are assumptions on whatthings should happen
    in WebRTC (e.g. setting the rtpTimestamp)
    … is metadata passed through the pipeline: converting a video
    frame with mediacapture transofrm and pass it to webrtc - is
    this still visible at the end in rVFC? in encoded transform?
    … in WebCodecs encoded chunks?
    … do we need to file related issues?

      [47] https://github.com/w3c/webcodecs/issues/607

    Youenn: I filed some of these issues - captureTime etc are
    planned to move to videoframemetadata
    … that should bring consistency throughout the pipeline
    … mediacapture transform will not perserve it magically - if
    you clone the frame, metadata will be clone along with it
    … likewise if it goes through WebRTC PC
    … encoded chunks doesn't expose that metadata - maybe we
    should; we haven't heard feedback or use cases for that yet
    … in terms of what the WG may need to discuss: how do we
    compute presentationTime? VideoTrackGenerator allows to set
    timestamp, but we're not defining what happens on rendering
    (e.g. re jitter buffer)

    Harald: if a processing element has metadata defined both as
    part of input & output, should we have a general rule about
    metadata it doesn't understand?
    … for the metadata info it knows about (e.g. width and height
    for an encoder), it won't remain unchanged
    … but for metadata that isn't understood, should have a rule to
    leave it unchanged?

    Bernard: the registry rule is that this is up to the
    registry-linked spec to define
    … not sure we can have a rule that is imposed to all WGs
    … a rule would have to be proposed to be enforced

    Youenn: individual metadata spec could describe how they're
    handled by processors

    Bernard: next step would be to file specific issues on specific

    Youenn: the main remaining issue might be on rendering time
    … in media capture main

   [48]Face Detection [49]🎞︎

      [48] https://github.com/w3c/mediacapture-extensions/pull/78
      [49] https://www.youtube.com/watch?v=YHLpqvcRAlY#t=3704

    [50][Slide 33]


    Tuukka: the face detection proposal now uses videoframemetadata

    [51][Slide 34]


    [52][Slide 35]


    [53][Slide 36]


    [54][Slide 37]


    [55][Slide 38]


    Tuukka: looking for feedback on the general direction

    Youenn: thanks - looks like a great improvement, and exciting
    to see this moving forward
    … dictionary members probably don't need to be nullable, but
    some may need to be marked as required
    … re center points vs bounding box vs best possible contours:
    I'm not sure if a sequence is best vs different fields
    … not sure about faceDetectionMaxCountourPoints - do we really
    need this now? can we leave this for later? or have a hint?
    … if developers just want a bounding box, maybe we should let
    developers express it, and send back a detailed contour
    … the example may need an update wrt @@@
    … I guess this means the proposal will be split across
    webcodecs and mediacapture-extensions

    Tuukka: the metadata and the constraints are both specified in
    mediacapture extensions
    … are you suggesting the former should be done in webcodecs?

    Youenn: not sure - I guess this is testing the registry process
    … the registry entry could either define the metadata or link
    to the mediacapture extensions spec

    Tuukka: the constraints and metadata are co-dependent
    … they need to be maintained together

    youenn: that makes sense; webcodecs has been asking to be able
    to review metadata when they change, so it may be best to have
    something in webcodecs space
    … we can iterate with webcodecs folks on the details

    timp: I like this - looks useful & interesting
    … it would be good to document the lifespan and meaning of the
    id - in particular, that it doesn't allow to correlate faces
    across streams
    … re contour & bounding box, I agree with Youenn that they're
    not the same and should be handled separately, not rely on 4
    items == bounding box

    tuukka: the goal here was to avoid cluttering metadata as new
    contour approaches emerge

    jan-ivar: looking at the broader question of merging this
    … from a privacy perspective, it looks like it doesn't add any
    concerns over having the detection done in JS
    … this looks good to me

    Youenn: let's see a PR that editors can iron out and then run a

   [56]MessagePort on Capture Handle [57]🎞︎

      [56] https://github.com/w3c/mediacapture-handle/issues/70
      [57] https://www.youtube.com/watch?v=YHLpqvcRAlY#t=5068

    [58][Slide 44]


    [59][Slide 45]


    [60][Slide 46]


    [61][Slide 47]


    [62][Slide 48]


    [63][Slide 49]


    [64][Slide 50]


    Youenn: having a message channel between capturer and capture
    makes sense
    … a few things off in the API shape that we can iterate on
    (e.g. event handler in a dictionary - they're usually on
    … I'm not sure about the "supportsMessagePort" boolean
    … I would prefer we start from a minimal API surface
    … also for messageportinvalidated - we should discuss this with
    the HTML spec folks
    … this underlying behavior already exists with other
    … I would prefer a name different "getMessagePort" given its
    side effects
    … I like the integration with capture handle

    elad: +1 to "openMessagePort" instead of get...
    … I'm happy to discuss reduction of API surface
    … s/handle/controller
    … my proposal deals both with capture handle and controller -
    how do you feel about integration with handle?

    Youenn: event handler in a dictionary feels wrong
    … don't have strong feelings on handle vs mediaDevices in

    elad: the link to capture handle happens both on capturer &
    … you commented on only one side?

    youenn: on the other side, I would move it to capture

    jan-ivar: I really like the 1st part of the presentation -
    agree on use cases & requirements
    … would like to iterate on github on the API shape
    … generally would agree with youenn to move it to controller
    rather than track
    … I think the direction you're presenting makes sense as a
    starting point

    Elad: so next steps is to surface similar events following that
    pattern on capture controller
    … we should revisit this a the next meeting

   [65]enumerateDevices & Focus [66]🎞︎

      [65] https://github.com/w3c/mediacapture-main/pull/912
      [66] https://www.youtube.com/watch?v=YHLpqvcRAlY#t=6119

    [67][Slide 53]


    jan-ivar: [68]PR #912 allows the behavior in Safari by relaxing
    the focus requirements a little bit

      [68] https://github.com/w3c/mediacapture-main/pull/912

    [69][Slide 54]


    [70][Slide 55]


    Youenn: I like this proposal; LGTM

    Harald: my reading is that it waits after the gUM prompt has
    been replied to?

    jan-ivar: after it has shown up, not responded to (since that
    requires focus in any case)

    harald: I'll re-read the PR carefully to make sure it doesn't
    introduce issues

    Elad: can you clarify the "anti-spying" behavior?

    Jan-Ivar: the PR doesn't change the focus requirement, only its

    Elad: ok, I'll bring the question on github then

    [71][Slide 56]


    Jan-Ivar: we also had developers complaining that
    enumerateDevices() block when there is no focus (which is
    marked an optional behavior)
    … the PR proposes to make it tied to visibility, not focus
    … this helps backwards compat, and still satisfies the
    anti-fingerprinting requirement (anti-spying only applies to
    … this would make the check deterministic as requested by the

    [72][Slide 57]


    Youenn: so the goal is to reduce friction for developers and
    align user agents behaviors - that's a good goal
    … do you foresee compat issues in implementing this?
    … will it fix existing firefox issues that developers were
    complaining about or does that require developers adoption
    before it does?

    jan-ivar: they would have to add the visibilityState check to
    avoid being "blocked"

    elad: I could use more time to review this

    youenn: I think it would be good to get feedback from other UAs
    and developers

    Bernard: do we need a CfC?

    Jan-Ivar: developers should be happy given that it relaxes the

    Dom: does this need an updated privacy review?

    jan-ivar: I don't think so since the behavior was already
    … and the fuzzing advice is already in the spec

    harald: I'll have to review this in details

    Dom: so we can delegate this for final review by Harald, Elad &


     Minutes manually created (not a transcript), formatted by
     [73]scribe.perl version repo-links-187 (Sat Jan 8 20:22:22
     2022 UTC).

      [73] https://w3c.github.io/scribe2/scribedoc.html

Received on Wednesday, 16 November 2022 12:26:41 UTC