[minutes] TPAC meetings


The minutes of our WG meeting at TPAC are available at:
* Sep 12 http://www.w3.org/2022/09/12-webrtc-minutes.html
* Sep 13 http://www.w3.org/2022/09/13-webrtc-minutes.html

The minutes of our joint meeting with the Media WG and the Media and 
Entertainment IG are available at:
* Sep 15 https://www.w3.org/2022/09/15-mediawg-minutes.html

They're all copied as text below as well.



                         WebRTC TPAC 2022 - Day 1

12 September 2022

    [2]Agenda. [3]IRC log.

       [2] https://www.w3.org/2011/04/webrtc/wiki/September_2022
       [3] https://www.w3.org/2022/09/12-webrtc-irc


           BenWagner, bernard, Bradley_Needham, Cullen, DanSanders,
           dom, Elad, EricCarlson, Henrik, hta, jeff, JIB, Jinkyu,
           Kentsaku, Louay_Bassbouss, MarkFolz, MartinThomson,
           MichaelZou, Mike English, MikeEnglish, Orphis,
           PatrickRockhill, Peter_Thatcher, Philipp, Randell, riju,
           Tatsuya, Thomas, Tony, Tove, TuukaToivonen, Varun, Will,
           Xiameng, Youenn, YvesL


           bernard, hta, jan-ivar

           dom, youenn_


     1. [4]State of the Union
     2. [5]WebRTC-NV Use Cases
          1. [6]issue #62 / [7]PR #75
     3. [8]Developer Engagement
     4. [9]WebRTC-PC
          1. [10]WebRTC revision process
          2. [11]PR #2763: add relayProtocol to RTCIceCandidate
          3. [12]Simulcast negotiation
          4. [13]Issue [14]#2734: addTransceiver does not check for
             missing rid properties
          5. [15]#2733 addTransceiver does not check for uniqueness
             of rid
          6. [16]Issue [17]#2762: Simulcast: Implementations do not
          7. [18]Issue 2764: What is the intended behavior of
             rollback of remote simulcast offer?
          8. [19]Issue [20]#668 - When are RTP streams destroyed?
          9. [21]Issue [22]#643 - Do we agree removing "sender",
             "receiver" and "transceiver" stats is a good idea?
         10. [23]Issue [24]#666 -
         11. [25]Issue [26]#662 - Don't expose so many
     5. [27]WebRTC-Extensions
          1. [28]Issue [29]#98: Disabling hardware
          2. [30]Issue [31]#112: header extension API: do we need
             enabled in setParameters?
          3. [32]Issue [33]#111: Integration of congestion control
             across SCTP and media
     6. [34]Summary of resolutions

       [6] https://github.com/w3c/webrtc-extensions/issues/62
       [7] https://github.com/w3c/webrtc-extensions/pull/75
      [11] https://github.com/w3c/webrtc-extensions/pull/2763
      [14] https://github.com/w3c/webrtc-extensions/issues/2734
      [15] https://github.com/w3c/webrtc-extensions/issues/2733
      [17] https://github.com/w3c/webrtc-extensions/issues/2762
      [20] https://github.com/w3c/webrtc-extensions/issues/668
      [22] https://github.com/w3c/webrtc-extensions/issues/643
      [24] https://github.com/w3c/webrtc-extensions/issues/666
      [26] https://github.com/w3c/webrtc-extensions/issues/662
      [29] https://github.com/w3c/webrtc-extensions/issues/98
      [31] https://github.com/w3c/webrtc-extensions/issues/112
      [33] https://github.com/w3c/webrtc-extensions/issues/111

Meeting minutes

    Slideset: [35]https://lists.w3.org/Archives/Public/www-archive/


    <Bernard> Dom volunteers to take notes. Yeah!

    <Bernard> We will not be recording the session.

   State of the Union

    [36][Slide 13]


    HTA: our charter started in 2011 - we're about 11 years old
    … the last 2 revisions of our charter have had very little
    … we're set to define APIs to enable real-time communications
    in the browser
    … we've reached Rec for one of our document - although that doc
    is still evolving as Dom will talk about later today
    … other of our docs are functionally stable, broadly used but
    not progressing much on the Rec track - we've not been very
    good at pushing to the last stages
    … mediacapture-main has been close to be ready for a couple of
    years; but still need some tidying
    … not a lot of work on resdesigning current concepts - are they
    just good enough, or people just working around their
    … ongoing work on new features; gaze correction, face
    detection, etc
    … We need to focus on getting mediacapture-main and
    webrtc-stats out of the door
    … Stream transforms and media codecs are challenging our basic
    model, requiring us rethinking
    … In terms of what we're not doing: we've stopped work on
    … no active work on 3D & spatial awareness - talked about it,
    but not specific proposals
    … SOme of the work is being done elsehewere: EME, streaming of
    stored media (see MOQ in IETF)
    … integration with other work (WebGPU, WebNN) could also be
    usefully done
    … Changes since last year: additional clarity on what we're
    doing and not doing
    … not many new resources dedicated to progress the work

    <fluffy> One small side note on slide, I see the MoQ work being
    used for live and real time media as well as stored.

    HTA: a lot of usage of WebRTC out there
    … including outside the browser

   WebRTC-NV Use Cases

    [37][Slide 20]


    [38][Slide 21]


    Bernard: reviewing history of WebRTC NV use cases - initial
    draft 27 months before the pandemic, webcodecs
    … pandemic brought technologies to the mass market
    … in TPAC 2021, we talked about the relevance of NV use cases
    … Tim Panton submitted a number of new use cases which arose
    during the pandemic
    … were added in Nov 21

    [39][Slide 22]


    Bernard: how do these use cases relate to our current work? Not
    … still plenty of requirements not covered by our existing work
    … 4/11 use cases have matching API proposals
    … unclear consensus on some of these use cases
    … dependency to changes in data transport that the WG hasn't
    been working on
    … how close are we to satisfying these use cases
    … extending webrtc vs using webcodecs over new transport

     [40]issue #62 / [41]PR #75

      [40] https://github.com/w3c/webrtc-nv-use-cases/issues/62
      [41] https://github.com/w3c/webrtc-nv-use-cases/pull/75

    Bernard: Game streaming needs ultra-low latency (<100ms) but
    not huge scale
    … low latency broadcast on the other hand can live with ~1s
    latency, but needs scalability
    … most game sreaming services use WebRTC today
    … quite a few of the low latency broadcast solutions also using
    … the WISH WG in IETF is looking at ingestion via WebRTC

    [42][Slide 23]


    [43][Slide 24]


    Bernard: Game streaming has typically one way A/V direction,
    with data going in the other direction
    … sometimes with P2P support

    Bernard: this needs lower level control on the data transport
    (req N15)
    … high resolution video processing (N37)
    … and control for the jittering buffer / rendering delay (N38)

    Harald: re rendering delay, it's often outside of the control
    of the app or browser, it's part of the OS
    … that might turn into a requirement to know it rather than
    control it

    <Bernard> Harald: may need to know it, not control rendering

    fluffy: re game, there is also controller input (incl access to
    USB devices) where latency also matters there

    <Bernard> Fluffy: APIs from controller...

    youenn: would this relate to the gamepad API?

    fluffy: that API isn't fast enough - it may need either
    implementation improvements or API changes

    hta: may not be in scope for the group, but useful to report to
    the relevant groups

    <Bernard> Tim Panton: data channel in workers

    TimP: if communicaiton can happen from a worker, this would

    Bernard: we've heard repeated support for worker support for
    data channels

    <martinthomson> my impression is that the player often isn't
    using a web browser when playing the game here, so it might not
    have a direct effect on this particular use case

    Bernard: particularly important in the context of webcodecs in
    a worker

    [44][Slide 25]


    Bernard: low latency broadcast is different both in terms of
    latency and scale
    … this can extend to distributed auctions and betting
    … limited interactivity, needs NAT
    … this also ties to encrypted media
    … that use case also comes with additional control of data
    transport (N15)
    … and DRM (N36)

    <steely_glint> I believe Stadia is in the browser.

    Bradley: working on the gamepad API extension for multitouch
    support (for Sony)
    … sending the gamepad data as quickly to the server directly
    from the hardware would be ideal

    <martinthomson> a new RTP media type?

    Bernard: would this fall under the gamepad API?

    Bradley: not sure where it should live - but it probably should
    not be part of the gamepad API which is meant for interperation
    in the client

    thomas: in the ultra-low latency use case, there is also a
    distributed music playing use case
    … challenging to do without very low latency

    bernard: we had discussion about a 3rd use case around
    distributed music playing
    … different from game given that media has to flow in both

    hta: we have some prior experience from taking input from one
    place and moving it to another without touching the main thread
    … with stream forwarding
    … once we have them transferable
    … that might be something that we could cogitate more:
    connecting a stream of gamepad events to an RTP stream and have
    "magic happens"

    <Bernard> Harald: transferrable streams can connect a stream of
    gamepad events from main thread to a worker thread.

    TimP: there is a clear definition in the USB spec of what a
    gamepad sends

    youenn: there is a discussion about gamepad in a worker
    happening in their repo

    fluffy: sending gamepad input directly via RTP is valuable, but
    could apply to other modalities (e.g. gesture tracking)
    … would be nice to have something flexible to deal with that
    variety of input

    HTA: will these use cases be proposed for consensus?

    <fluffy> If it is interesting for anyone, some work I am going
    on game input over RTP [45]https://datatracker.ietf.org/doc/


    Bernard: once we finalize the requirements in the PR, we will
    bring it to CfC

   Developer Engagement

    [46][Slide 27]


    TimP: there is unmet demand out there, possibly for lack of
    … how could we get more resources into this group?

    TimP: I was interviewing a devrel expert on how to build a
    … part of it is figuring out what's valuable to people show up,
    and the blockers to people that don't show up

    [47][Slide 28]


    TimP: what's valuable to the people are showing up?
    … it's about the value of standards - they grow the market,
    avoid some legal complexities, provide certainty to developers
    … and it builds on a shared expertise, better results than a
    de-facto standard

    [48][Slide 29]


    TimP: What's valuable and unaddressed for people who aren't
    showing up?
    … not sure we know
    … who would we want to show up?

    [49][Slide 30]


    TimP: What are the blockers to people who don't show up? incl
    those who did try but didn't stick
    … some of the blockers are legal (not allowed to contribute)
    … some is because they're not seeing progress on the issues
    that matter to them
    … also issues with cumbersome process or hostile atmosphere
    … limitations due to the fact that we're only dealing with half
    of the problem given inactivity of RTCWeb in IETF

    [50][Slide 31]


    TimP: Part of what is needed is to make it so that people don't
    waste their time, feeling they're not being listened to

    [51][Slide 32]


    TimP: a possible solution would be to create a network o users
    - not people building browsers or WebRTC stack
    … could provide very useful feedback & input
    … could be done under chatham house rules

    <hta> discuss-webrtc@googlegroups.com is one place where such a
    forum could be advertised.

    <fippo> hta: for that discuss-webrtc would have to be moderated
    for first-time posters more often than the current "every
    second saturday" :-(

    dom: similar experience with the WebAuthn Adoption Community
    … would be happy to work with you in exploring a similar path
    for WebRTC

    TimP: started sharing that idea with a few folks, getting some
    interesting feedback

    [52][Slide 33]


    TimP: we also need to broaden our intput
    … incl from non-browser projects (e.g. pion)
    … they have a lot more flexibility in experimenting with APIs
    than in a browser context
    … using as a sandbox to changes, or input to our own design

    bradley: one of the big things for me working on the gamepad
    API and its chromium implementation
    … one of the frustrations in shipping that experimental feature
    requires an extension to the standard for iterating
    … hard to iterate quickly

    TimP: maintaining a fork of Chromium is a substantial effort

    fluffy: I feel what you're saying on all of this; an
    interesting premise is that this group is often seen as a group
    of implementors
    … instead of a negotiation between users and implementors
    … the reasons I don't come here are similar to what you
    … I'm not sure having a separate group will help if we don't
    fix the root cause

    TimP: I accept that risk - but feels better than doing nothing

    fluffy: right - but unless the input gets listened to, it will
    still feel like a waste of time

    TimP: right this group, would have to commit to listen to the
    … also helps decoupling the legal issues, and provides a
    different audience

    Bernard: with the next generation of APIs, we're also not
    talking to a single group - there is WebTranport, Web Media...
    … up to 7 groups with one aspect of that architecture
    … providing a single entry point would help

    <martinthomson> is steely_glint talking about forming a
    community group?

    TimP: that expands the scope of what I had in mind

    hta: the place I currently see for this kind of developer input
    is in the chrome bug tracker where they're requesting
    non-standard feature
    … there is a community willing to provide input but not sure
    how to contributing it
    … Google should not the org doing the outreach

    TimP: +1 - it should be somewhat separate from the browser
    makers community
    … not quite sure yet how to manage the input to the group
    … if there is feeling this is worth constructing, I'll take a
    stab at it

    hta: I support this

    youenn: at a previous TPAC, there was a session where we
    invited developers for feedback

    Henrik: what would be the output of this group? concrete API
    proposals? PoC implementations?

    TimP: really good question - would be dependent on what this
    group would accept
    … not standards proposals though
    … it could be a prototype in Pion
    … and then translated in W3C speak
    … it could also be useful when we have questions about
    developer ergnomics

    HTA: next step is on TimP's shoulders


    [53][Slide 36]


    [54][Slide 37]


     WebRTC revision process

    [55][Slide 38]


    [56][Slide 39]


    [57][Slide 42]


    dom: future cfc to move on with agreed changes in WebRTC pc
    … plus discussions to move away from WebRTC extensions.

    Bernard: there are a number of changes in webrtc-pc that we
    could move back from webrtc-extensions

    Bernard: number of WebRTC extensions have been removed from
    webrtc-pc, they are now implemented. So we should move them
    back to webrtc-pc?

    dom: yes

    Bernard: e.G maxframerate has been implemented

    <martinthomson> "at least one browser" isn't something that I
    consider sufficient for moving stuff

    youenn: what is the process to move directly to webrtc-pc

    dom: we would provide annotations (experimental for instance).

    youenn: would reduce editors burden to directly go to

    hta: hearing support to move in that direction

    <vr000m> Does this change confuse the reader on what has been
    implemented and what is not. This was one of the main reasons
    to split the work into extensions, i.e., good ideas to be
    implemented but not yet.

    hta: also show values in pushing docs to Rec, e.g. for
    mediacapture-main and webrtc-stats

    dom: if possible a module is good, some changes are deep in the
    main spec, so good to update the webrtc main spec.

    orphis: what about webrtc 2.0?

    dom: we could go with webrtc 1.1... 2.0.
    … might be better to move away from versioning.

    renaming WebRTC 1.0 -> WebRTC instead for instance.

    The room seems to agree with the overall direction.

     [58]PR #2763: add relayProtocol to RTCIceCandidate

      [58] https://github.com/w3c/webrtc-pc/pull/2763

    [59][Slide 42]


    fippo: propose to fix inconsistency between stats and -pc on
    candidate objects
    … also remove unimplemented url field of the event interface

    youenn: it's exposed in Safari

    hta: it makes more sense to have it in the candidate than in
    the event

    fippo: I'll make a PR for that

    [60][Slide 43]


     Simulcast negotiation

    [61][Slide 44]


    [62][Slide 45]


    Jan-Ivar: looking from easiest to hardest issue
    … starting with [63]#2732
    … Chrome and Safari throws on > 16 characters
    … propose we align with that

      [63] https://github.com/w3c/webrtc-pc/issues/2732

    fluffy: strongly object to that without having this done in
    … agree with fixing it, but in the right place

    <martinthomson> fluffy: RFC 8851-bis?

    youenn: this is only from the sender side

    jan-ivar: this would only affect for transceiver, not in offers
    to receive simulcast

    Byron: if JS try to use that the impl can't handle, would
    OperationError be appropriate

    Henrik: could reject the offer?

    Byron: that's not an offer that point of the discussion

    hta: background on the 16 choice - this used to create a Chrome
    … with inconsistent checks (one on 256 vs 16)
    … we did hit someone sending a 17 characters rid

    Orphis: the limit of 16 characters was in the older version of

    HTA: can also commit to file an IETF issue

    Jan-ivar: but this change is only impacting what browser would

    <Orphis> jib: [64]https://www.w3.org/TR/2017/


    RESOLUTION: limit rid to 16 in addTransceiver and file issue in
    IETF on RID length in general

    [65][Slide 46]


    jan-ivar: other cases where chromium and safari throws on
    specific valid RID

    cullen: getting the inconsistency between rfc8851 and rfc8852
    fixed and aligning with it sounds better

    Byron: 8852 is more restrictive than 8851
    … aligning with the more restrictive is probably more prudent
    … we can take that to the IETF

    jan-ivar: so do that before making the change?

    hta: we should at least add a note that - and _ can be

    <vr000m> ABNF on 8851 is:

    <vr000m> rid-syntax = %s"a=rid:" rid-id SP rid-dir
    rid-pt-param-list / rid-param-list ] rid-id = 1*(alpha-numeric
    / "-" / "_")

    hta: not unreasonable to limit what we send within the
    constraints of the RFC

    youenn: we should document something in the spec right now and
    add a link to the IETF discussion

    hta: +1

    <vr000m> rid-syntax = %s"a=rid:" rid-id SP rid-dir
    rid-pt-param-list / rid-param-list ] rid-id = 1*(alpha-numeric
    / "-" / "_")

    cullen: 8851 is what defines the rid syntax

    [66][Slide 47]


     Issue [67]#2734: addTransceiver does not check for missing rid

      [67] https://github.com/w3c/webrtc-pc/issues/2734

    hta: +1

    youenn: +1

     [68]#2733 addTransceiver does not check for uniqueness of rid

      [68] https://github.com/w3c/webrtc-pc/issues/2733

    [69][Slide 48]


    hta: +1

    youenn: +1 as long as there is no web compat issue

     Issue [70]#2762: Simulcast: Implementations do not fail

      [70] https://github.com/w3c/webrtc-pc/issues/2762

    [71][Slide 49]


    [72][Slide 50]


    hta: one edge case I've encountered: if you get a remote offer
    with 3 layers
    … and then get a later remote offer with 2 (reduced # of
    layers), I've implemented it as removing top layer
    … if you then have a local offer, should you send 2 or 3? ie
    should we generate an offer that tries to re-expand it

    Byron: maybe a bad idea? I don't know

    hta: not sure what the right way is, but the spec should say

    jan-ivar: that will block this issue

    hta: I'll make a note on the issue
    … but what's on the slide is fine - it's just another edge case
    to consider

     Issue 2764: What is the intended behavior of rollback of remote
     simulcast offer?

    [73][Slide 51]


    Byron: the reverse situations is more complicated - a
    previously negotiated simulcast session with 3 and a re-offer
    with 2 that gets rolled back - what should happen?

    henrik: the rollback has to reset everything, otherwise it
    becomes difficult to re-purpose the transceiver

    youenn: I'm surprised of the divergence, may be a bug in safari

    [running of time to finalize session on simulcast]


    With webrtc stats

    [slide 59] what to do with RTP stats lifetime

    Henryk: in favour of proposal A.

    stream stats do not have any useful information before
    receiving / sending packets

    <vr000m> I like Proposal A as well, as there is no data before
    the packet is sent or received

    jib: proposal A is good to me

    fippo: proposal A might be a breaking change as people might
    rely on it

    risky change

    jib: people relying on this could only work for some browsers.

    youenn: how will you ensure backwards compat?

    henrik: would ship with a flag to disable the change

    RESOLUTION: Proposal A but implementors need to check web

    Henrik will prototype the change

    [slide 60] when are stats destroyed

     Issue [74]#668 - When are RTP streams destroyed?

      [74] https://github.com/w3c/webrtc-pc/issues/668

    Harald: one reason for not deleting stats is to be able to get
    the total number of bytes.

    hta: nobody has complained that this hasn't been possible

    jib: +1
    … part of the reasons for eternal stats was to keep them
    available after pc.close() which would be lost here
    … probably fine since nobody has complained about this
    … SSRC change is hard to find in the WebRTC stack

    fluffy: SSRC change in the remote side is hard to detect;
    they're rare but can happen

    henrik: the point of PR is exposing the current set of streams
    as identified by the impl

    fluffy: transceiver going away I understand; not sure how to
    detect SSRC going away though (e.g. long mute)

    henrik: renegotiation might the practical approach

    RESOLUTION: agreement to delete stats when sender/receiver is

    Issue 643

     Issue [75]#643 - Do we agree removing "sender", "receiver" and
     "transceiver" stats is a good idea?

      [75] https://github.com/w3c/webrtc-pc/issues/643

    [76][Slide 61]


    [77][Slide 62]


    RESOLUTION: close issue 643 as "yes, it was a good idea to
    remove them for WebRTC stats"

    youenn: it may be worth even removing it from -provisional

    henrik: maybe, but worth discussing separately

     Issue [78]#666 - powerEfficientEncoder/powerEfficientDecoder

      [78] https://github.com/w3c/webrtc-pc/issues/666

    [79][Slide 63]


    henrik: proposed [80]PR #670

      [80] https://github.com/w3c/webrtc-pc/pull/670

    youenn: there is a fingerprinting issue with
    … we need to solve it before adding a new surface
    … at least in safari, this would be exposing new information
    not exposed by encoderImplementation

    henrik: how about tying this to getUserMedia permission?

    jib: I think tying this to gUM would be a mistake
    … powerEfficient was moved to media capabilities to avoid this
    privacy issue

    <Tim_Panton> Not in favour of linkage to get user media.

    youenn: the issue with media capabilities is that it's about
    what can be done, but what is being done
    … e.g. if all hardware encoders are being used

    jib: isn't the fingerprinting part that the system has

    <Tim_Panton> GUM should be irrel

    <Bernard> Agree. Media Capabilities has no dependency on gUM

    <Tim_Panton> irrelevant to a recv only decoder

    youenn: exposing that there is hw-capability *and* that is
    being used is additional fingerprinting

    jib: re encoderImplementation exposing this, it's only in some
    implementations - and we should limit that too

    youenn: let's revive the fingerprinting issue

     Issue [81]#662 - Don't expose so many RTCCodecStats!

      [81] https://github.com/w3c/webrtc-pc/issues/662

    [82][Slide 64]


    henrik: proposing PE [83]#669

      [83] https://github.com/w3c/webrtc-pc/issues/669

    jib: could also be obtained from media capabilities

    orphis: or with getParameters

    RESOLUTION: Adopt [84]#669

      [84] https://github.com/w3c/webrtc-pc/issues/669


    [85][Slide 66]


     Issue [86]#98: Disabling hardware encoding/decoding

      [86] https://github.com/w3c/webrtc-extensions/issues/98

    [87][Slide 68]


    [88][Slide 69]


    youenn: why would this succeed more than setCodecPreferences?

    bernard: it's simpler

    fippo: I have an implementation for it, very short

    dom: how could this be used for fingerprinting?

    fippo: by turning it on and off, this may be surface more
    details about codecs profiles

    youenn: re gating this to reloading - would this be the top
    level page or iframe?

    jib: to prevent fingerprinting, tying this to top level context
    would make it a bigger deterrent

    hta: let's iterate on privacy issues in the bug

     Issue [89]#112: header extension API: do we need enabled in

      [89] https://github.com/w3c/webrtc-extensions/issues/112

    [90][Slide 70]


    [support for the removal]

     Issue [91]#111: Integration of congestion control across SCTP and

      [91] https://github.com/w3c/webrtc-extensions/issues/111

    [92][Slide 71]


    [93][Slide 72]


    <englishm> +present Mike English

    <vr000m> What would be the API impact? wouldn't the existing
    APIs be sufficient input for the implementors?

    hta: there actual cases where congestion control conflicts
    prove problematic, e.g. large file upload in Meet

    youenn: is webrtc priority addressing this?

    fluffy: the RTP and the data can go different destinations,
    which further complicates the congestion control
    … feels like implementation quality issue
    … the right solution is probably to have them both on top of

    Bernard: main use cases are in game streaming
    … a bit distinct from this, with current congestion control on
    the datachannel

    Peter: if we have a good API for encoded stream, then you could
    send you A/V/data on a single transport, or you could encode
    them together and send them over rtp
    … which would ensure they get treated as a whole for congestion

    <jesup_> +1 to cullen; people do want this. not here though

    hta: hearing consensus that [94]#111 can be closed as no action
    needed from this group

      [94] https://github.com/w3c/webrtc-extensions/issues/111

    RESOLUTION: close [95]#111

      [95] https://github.com/w3c/webrtc-extensions/issues/111

Summary of resolutions

     1. [96]limit rid to 16 in addTransceiver and file issue in
        IETF on RID length in general
     2. [97]Proposal A but implementors need to check web
     3. [98]agreement to delete stats when sender/receiver is
     4. [99]close issue 643 as "yes, it was a good idea to remove
        them for WebRTC stats"
     5. [100]Adopt [101]#669
     6. [102]close [103]#111

     [101] https://github.com/w3c/webrtc-extensions/issues/669
     [103] https://github.com/w3c/webrtc-extensions/issues/111


                         WebRTC TPAC F2F - Day 2

13 September 2022


       [2] https://www.w3.org/2011/04/webrtc/wiki/September_2022


           AlexanderFlenniken, BenWagner, Bernard, Byron,
           ChrisNeedham, Cullen, Dom, EeroHakkinen, Elad, EricC,
           Erik, Florent, Harald, Henrik, Jan-Ivar, JeffWaters,
           Mark_Foltz, MarkFoltz, MartinT, Masaki, MichaelSeydl,
           Mike, Mike English, Peter, Philipp, RichardBarnes, riju,
           Tony, Tove, TuukkaToivonen, Youenn


           Bernard, hta, jan-ivar



     1. [3]WebRTC Encoded Transform
          1. [4]Issue [5]#106: Add use cases that require one-ended
             encoded streams
          2. [6]Issue [7]#90: Pluggable codecs
          3. [8]Issue [9]#31 & Issue [10]#50: Congestion Control
          4. [11]Issue [12]#99 & Issue [13]#141: WebCodecs & WebRTC
          5. [14]Issue [15]#70: WebCodecs & MediaStream transform
          6. [16]Issue [17]#143: generateKeyFrame
     2. [18]Conditional Focus
     3. [19]Screen-sharing Next Steps
     4. [20]Encoded transform
          1. [21]Issue [22]#131 Packetization API
     5. [23]Action items & next steps
     6. [24]Summary of resolutions

       [5] https://github.com/w3c/webrtc-encoded-transform/issues/106
       [7] https://github.com/w3c/webrtc-encoded-transform/issues/90
       [9] https://github.com/w3c/webrtc-encoded-transform/issues/31
      [10] https://github.com/w3c/webrtc-encoded-transform/issues/50
      [12] https://github.com/w3c/webrtc-encoded-transform/issues/99
      [13] https://github.com/w3c/webrtc-encoded-transform/issues/141
      [15] https://github.com/w3c/webrtc-encoded-transform/issues/70
      [17] https://github.com/w3c/webrtc-encoded-transform/issues/143
      [22] https://github.com/w3c/webrtc-encoded-transform/issues/131

Meeting minutes

    Slideset: [25]https://lists.w3.org/Archives/Public/www-archive/


    [26][Slide 75]


   WebRTC Encoded Transform

    [27][Slide 78]


    Harald: encoded transform is fine for crypto, but not fine for
    other manipulation use cases

     Issue [28]#106: Add use cases that require one-ended encoded streams

      [28] https://github.com/w3c/webrtc-encoded-transform/issues/106

    [29][Slide 79]


    Harald: several use cases where you want to connect a stream to
    somewhere else after processing
    … not sure what a proper API would look like, so thought we
    should go back to requirements

    youenn: looking at the use cases - they probably deserve
    different solutions
    … e.g. webtransport probably shouldn't use peerconnection
    … alternative encoders/decoders - sounds like a different API
    … metadata may be done prior to PC

    Harald: encoded transform is a stream source connected to
    stream sink
    … a one-ended stream has only one of these
    … we have an ecosystem of media & encoders that people have
    gotten used to
    … if we can plug into this ecosystem, it seems a better
    solution than creating novel solutions for this
    … it might be that we decide that it's not the same ecosystem
    … in which case we might kick the ball over to media

    youenn: starting from use cases and then deriving requirements
    as done for WebRTC-NV would be useful to do here
    … it's easier to derive APIs from requirements

    harald: the SFU-in-browser is a technique to achieve the
    scalable video conferencing use case we discussed yesterday

    youenn: describing use cases in more details and then derive
    requirements from there

    jib: +1 on better description of use cases

    Bernard: the NV use cases has no API that satisfies the
    … WebTransport doesn't support P2P; the only path is RTP

    JIB: so the idea would be to expose an RTP transport to JS

    Bernard: or make the datachannel a low-latency media transport,
    but there doesn't seem to be much stomach for that

    Harald: we have a discussion scheduled on whether to consider a
    packet interface in addition to a frame interface
    … We'll detail the use cases more to figure out if an extension
    of media stream is relevant or if we need something completely

     Issue [30]#90: Pluggable codecs

      [30] https://github.com/w3c/webrtc-encoded-transform/issues/90

    [31][Slide 80]


    Harald: we've been lying to SDP about what we're transporting

    <martinthomson> what is ECE?

    Harald: to stop lying, we need a mechanism to allow the app to
    tell the SDP negotiation that they're doing something else than
    the obvious thing

    <fippo> probably EME (there was a comment + question in the

    youenn: this may lead us to a conclusion than encoded transform
    was a mistake

    <martinthomson> ...I tend to think that this is already broken

    youenn: the other possibility, you could state during
    negotiation that you're going to use app-specific transforms
    … letting intermediaries know about this
    … we tried to push this to IETF AVTCore, without a lot of

    Harald: maybe MMUSIC instead?

    Cullen: it's worth trying again - slow move has been the
    pattern in the past 2 years, not a signal

    Bernard: the reason why SFRAMe has not moved in AVTCore is
    because nobody showed up, drafts were not submitted, and the
    area director is considering shutting down the SFrame WG

    Youenn: I went to several meetings, tried understand the
    submitted issues, but struggled to find solutions that would
    … the work has been stalled for lack of consensus

    herre: can we move forrward without the dependency on IETF, by
    allowing the JS to describe its transform to the other party?

    Youenn: encoded transform has a section on SFrame transform,
    which wasn't pointing to an IETF draft until recently

    Harald: the scripttransform is fully under the app control, but
    it doesn't have a way to tell the surrounding system it changed
    the format
    … we could add an API before the IETF work emerges

    Martin: SFrame is very close to death, I expect some more work
    to be done though
    … once you give script access to the payload, anything is
    … this breaks the assumptions under which the encoder and
    packetization operate
    … I don't think letting the script write the SDP, we need a
    model that makes sense, not sure what it would be

    Youenn: we had a model with the traditional video pipeline
    including a break into it
    … we could open it more and exposing more of the states of the
    … we could expose e.g. bitrate if useful, based on use cases
    … for pluggable codecs, you need to set a break before webrtc
    encded transform & the track, and be able to set a special

    martin: you'd want to deal with raw media (the track), then do
    the encoding and the packetization

    youenn: not sure we need all the breaks

     Issue [32]#31 & Issue [33]#50: Congestion Control

      [32] https://github.com/w3c/webrtc-encoded-transform/issues/31
      [33] https://github.com/w3c/webrtc-encoded-transform/issues/50

    [34][Slide 81]


    Martin: none of this is necessary if you're looking at just
    mutating packets

    Harald: not if the size or number of packets can change

    Martin: some of it can be modeled as a network-specific MTU for
    the SFrame transform

    Harald: the downstream would need to expose its MTU, and the
    SFrame transform would share its MTU upstream

    Martin: but beyond, this is looking at the entire replacement
    of the chain

    Youenn: the AR/VR use case is where data can be much bigger
    when you attach metadata
    … one possible implementation is to do this with
    ScriptTransform to stuff metadata in the stream, as a hack
    … not sure if we should accept this as a correct use of the API
    … in such a use case, expanding the frame size means the
    bitrate is no longer correct
    … the UA could instruct the encoder to adapt to the new frame
    … or we could expose new APIs

    <peter> Isn't the targetBitrate already in webrtc-stats?

    martin: AR/VR is probably a wrong usage of ScriptTransform
    … it would better be handled as a different type of
    … this points toward being able to build a synthetic media flow

    martinthomson: it would seem better to look at it this way
    rather than through a piecemeal approach
    … the AR/VR points toward synthetic media flows

    Bernard: people have tried using the datachannel for AR/VR
    … didn't work for A/V sync or congestion control
    … they want an RTP transform
    … the A/V stream helps with sync
    … if you put it in a different flow, how do you expose it in
    … it's the only way available in the WebRTC model today

    fluffy: on WebEx hologram, we do exactly what Martin describe
    … we send a lightfield in a stream that looks like a video
    … same for hand gestures etc
    … all of this sent over RTP
    … it's low bit-rate data, doesn't need to adapt like audio
    … lightfield instead needs bandwidth adaptation
    … this could apply to haptics, medical device data being
    injected in a media stream

    TimP: part of our problem has been mapping all of this to SDP,
    for things created on the fly
    … describing things accurately in SDP is a lost cause as we'll
    keep inventing new things

    <martinthomson> Steely_Glint_: SDP is extensible....

    TimP: we should be describing the way we're lying (e.g. we're
    going to add 10% to the bandwidth; it won't be H264 on the way
    … without trying to describe it completely

    acl peter

    s/acl peter

    Peter: I had proposed an RTP data mechanism a few years ago,
    which sounds similar
    … we could have an SDP field to say this is arbitrary bytes
    … or construct something without SDP

    Martin: I was suggesting new type of RTP flows with new
    … browsers can't keep up with all the ways that SDP would be
    used; we should instead give a way for apps to describe their
    "codecs" via a browser API

     Issue [35]#99 & Issue [36]#141: WebCodecs & WebRTC

      [35] https://github.com/w3c/webrtc-encoded-transform/issues/99
      [36] https://github.com/w3c/webrtc-encoded-transform/issues/141

    [37][Slide 87]


    youenn: Both WebRTC and WebCodecs expose similar states
    … but there are differences e.g. in mutability

    <jesup> I strongly agree with Martin's comments; these
    data-like should be "codecs", which allows for much more
    flexibility, specification, and interoperability

    youenn: should we try to reconcile? should we reuse webcodecs
    as much as possible?

    <Steely_Glint_> But we do need (in sframe) to allocate a
    suitable codec (say h264) - the 'generic' pass through drops
    that into

    youenn: I propose we stick to what we shipped

    DanSAnders: from the WebCodecs side, that sounds like a good
    … we don't have a generic metadata capability

    harald: so we should document how you transform from one to the
    … it's fairly easy to go from webrtc to web codecs
    … the reverse is not possible at the moment

    <Bernard> Youenn: we can create constructors to build
    RTCEncodedVideoFrame from EncodedVideoChunk

    herre: if we move to the one-ended model, this creates trouble
    in terms of ownership and lifecycle

    youenn: we deal with that problem in Media Capture transform
    through enqueuing via cloning (which is effectively a transfer)

    <peter> +1 to constructors for

    Bernard: re constructors to get from one type to another,
    allowing conversion between the two

    jib: your proposal doesn't address the mutability of metadata

    youenn: the particular metadata I'm referring to aren't mutable

    <Bernard> Harald: this model does not support the use cases we
    have been discussing.

    youenn: can we close the issue or should wait until the
    architecture get designed?

    Harald: I hear support for the two-ways transform

    youenn: let's file an issue specifically about that and close
    these 2 issues

     Issue [38]#70: WebCodecs & MediaStream transform

      [38] https://github.com/w3c/webrtc-encoded-transform/issues/70

    [39][Slide 88]


    [40][Slide 89]


    DanSanders: proposal 1 is straightforward
    … we don't have a metadata API for lack of a good enough
    technical proposal
    … the mutation/cloning aspect is the challenge
    … e.g. cropping may generate no longer accurate data about face
    … it depends on what cloning does

    peter: are we talking about how the metadata would go over the

    youenn: here we're focusing on mediastreamtrack as a series of
    … we don't have a good solution for moving it over the network
    as we discussed in the previous item
    … the WebRTC encoder could be a pass-through for the metadata,
    but it's still up in the air - we welcome contributions

    chris: in webcodecs, there is some request to expose H265 SCI
    metadata for user defined data

    <miseydl> some meta information might be provided by the
    containerization of the video codec itself (NAL info etc) would
    we populate that generic meta array with those infos?.

    chris: that would presumably be closed expose with videoframe
    … it would be useful to look at the use cases together

    Dan: this is kind of low priority because of low multiplatform
    … if we have a metadata proposal that works, it could be used

    youenn: we had someone sharing such an approach - although it's
    codec specific

    chris: we'll also continue discussing this at the joint meeting
    with Media

    harald: metadata has some specific elements: timestamp, ssrc,
    dependency descriptors
    … the last one obviously produced by the encoder
    … mutable metadata - if constructing a new frame is very cheap,
    we don't need mutability

    DanSanders: it's quite cheap, just the GC cost

    Harald: we'll continue the discussion at the joint meeting & on

     Issue [41]#143: generateKeyFrame

      [41] https://github.com/w3c/webrtc-encoded-transform/issues/143

    [42][Slide 90]


    <Ben_Wagner> WebCodecs spec requires reference counting:


    Peter: what about returning multiple timestamps?

    youenn: that's indeed another possibility

    <martinthomson> does it even need to return something?

    youenn: but then the promise will resolve at the time of the
    last available keyframe

    martinthomson: does it need to return anything, since you're
    going to get the keyframes as they come out?

    youenn: it's a convenience to web developers to return a
    promise (which also helps with error reporting)

    martinthomson: at the time the promise resolve, it resolves
    after the keyframe is available, which isn't the time you want

    <miseydl> one could also use the timestamp to associate/balance
    keyframerequests, which is useful for various reasons.

    youenn: it's resolved when the frame is enqueued, before the

    martinthomson: this seems suboptimal if what you want is the
    key frame
    … if frames are enqueued ahead of the keyframe

    youenn: in practice, the expectation that you'll be polling the
    stream otherwise your app is broken

    martinthomson: with machines that jank for 100s of ms

    youenn: the promise can also be used to return an error, which
    I don't think can be validated asynchronously

    martinthomson: that argues for a promise indeed; not clear that
    the timestamp return value

    fluffy: what you want to know is that the keyframe has been
    encoded; the timestamp is irrelevant

    youenn: so a promise at the timing we said, but not timestamp

    Peter: would it be reasonable to have an event when a keyframe
    is produced?

    youenn: you do that by reading the stream and detecting K

    Peter: I like proposal 3 as a way to cover the situations you

    TimP: the way I recalled it was the purpose of the timestamp
    was to help with encryption through sframe for key change

    martinthomson: this can be done by waiting to a keyframe in the
    stream before doing the key change
    … I also don't think it's strictly necessary to resolve the
    promise upon enqueuing

    <jesup> +1 for proposal 3. Simple. Agree with mt

    martinthomson: it could be done when the input has been

    RESOLUTION: go with proposal 3 without returning a timestamp

   [44]Conditional Focus

      [44] https://github.com/w3c/mediacapture-screen-share/issues/190

    [45][Slide 93]


    Elad: screen sharing can happen in situations of high stress
    for the end user
    … anything that distracts the user in that moment is unhelpful
    … the API we're discussing is to help the app set the focus on
    the right surface

    [46][Slide 94]


    [47][Slide 95]


    [48][Slide 96]


    elad: still open discussion on default behavior when there is a

    [49][Slide 97]


    [50][Slide 98]


    [51][Slide 99]


    [52][Slide 100]


    [53][Slide 101]


    youenn: re task, we want to allow for the current task - there
    is no infrastructure for that, but implementations should be
    able to do that
    … a bigger issue: Chrome and Firefox have a model where the
    screenshare picker always happen within the chrome
    … it's very different in Safari - picking a window focuses the
    … so the behavior would be to focus back on the browser window
    … being explicit on what is getting the focus would be better,
    so setFocusBehavior would be an improvement
    … I don't think we should define a default behavior since we're
    already see different UX across browsers
    … I would also think it's only meaningful for tabs - for
    window, they could determine it as the time of gDM call

    elad: re different UX models, we could fallback to make that a
    … re window vs tab, it may still be useful as a hint to adapt
    the picker

    youenn: unlikely we would do something as complex

    jan-ivar: I'm actually supportive of option 2
    … regarding applicability to window - for screen recording
    apps, the current behavior hasn't proved helpful

    youenn: but this could be done via a preset preference in the
    gDM call

    jan-ivar: we could, although maybe a bit superfluous

    jib: setFocusBehavior is a little more complicated, more of a
    constraint pattern with UA dependent behavior
    … but don't feel very strongly
    … but yeah, turning off focus by adding a controller doesn't
    sound great

    RESOLUTION: setFocusBehavior as a hint with unspecified default
    applicable to tabs & windows

    youenn: deciding to not focus is a security issue - it
    increases the possibility Web pages to select a wrong surface
    … since this lowers security, there should be guidelines for
    security considerations

    Elad: should this be a separate doc?

    youenn: let's keep it in screen-share

    jib: +1 given that we're adding a new parameter to

    <fluffy> scribe fluffy

    <fluffy> zakim. scribe fluffy

    <fluffy> Proposing cropTargets in a capture handle

   Screen-sharing Next Steps

    Slideset: [54]https://lists.w3.org/Archives/Public/www-archive/


    [55][Slide 104]


    [56][Slide 105]


    [57][Slide 106]


    [58][Slide 107]


    mark: setting the crop target on the capture handle - is that
    serializable / transferable ?

    youenn: serializable

    mark: then it could be transferred over the messageport

    elad: but there is no standard format for that

    youenn: re crop target serializability, +1
    … I'm not sure yet about having cropTargets in capture handle
    … it may require more data, e.g. different cropping for
    different sinks
    … having app specific protocol might be a better way to start
    before standardizing a particular one
    … re MessagePort, the security issues can be solved
    … re content hint, I'm not convinced
    … the capturer doesn't have to provide the hint, the UA can do
    it itself

    elad: so 3 comments:
    … - cropTargets may need more context (although my main use
    case is for a single cropTarget)

    youenn: this could be dealt on a per-origin protocol agreement

    elad: but that doesn't work with non-pre-arranged relationship

    jan-ivar: this MessagePort would be a first in terms of going
    cross-storage (not just cross-origin) - definitely needs
    security review
    … this could still be OK given how tied to user action and the
    existing huge communicaiton path via the video sharing
    … In the past, we've tried to piecemeal things by not having a
    … part of the feedback I've been getting is maybe to just have
    a MessagePort, as that would be simpler and help remove some of
    the earlier mechanisms we had to invent
    … thank you for suggesting cropTargets to allow
    non-tightly-coupled catpuree-capturer
    … I'm not sure if it's necessary if we're moving to a

    <youenn> @jib, window.opener can postMessage probably.

    elad: I don't think a MessagePort could replace the capture
    handle, since it only works for cooperative capturee/capturer
    … also the messageport alerts the capturee of ongoing capturer,
    with possible concerns of censorship
    … I think we need to address them separately

    hta: thanks for the clarification on MessagePort being
    orthogonal to CropTarget
    … MessagePort is two-ways were capture handle is one-way, this
    may have a security impact
    … I think these 2 proposals are worth pursuing (as a
    … not convinced yet about content hint
    … should this linked to a crop target instead?

    elad: would make sense

    TimP: I like all of this, and do like the multiple crop targets
    and notes
    … the MessagePort shouldn't replace the rest of this, it's more
    complicated for many developers
    … I like the 2 layers approach

    fluffy: I find the security issues with MessagePort concerning
    without more details
    … re trusting or not web sites for content hint - the capturer
    could determine it

    elad: content hint helps with setting the encoder correctly

    [59][Slide 108]


    jib: I don't think there is new information to change our
    current decision, nor have I had enough time to consider this

   Encoded transform

     Issue [60]#131 Packetization API

      [60] https://github.com/w3c/webrtc-encoded-transform/issues/131

    [61][Slide 80]


    hta: would this packetization & depacketization?

    youenn: we would probably need both, good point

    Peter: we could add custom FEC to the list as a valid use case
    … being able to send your own custom RTP header would be nice
    … although that would be possible to put in the payload if you
    had control over it

    richard: this points toward an API that transforms the packets
    à la insertable stream
    … SPacket is simpler for encryption

    Bernard: we need to be able to packetize and depacketize if we
    use it for RED or FEC
    … you need to be able to insert packets that you recover

    HTA: I don't think we can extended the encodedvideoframe for
    this, it's the wrong level
    … we need an rtcencodedpacket object probably
    … any impression on whether that's something we should do?
    … do we have enough energy to pursue this?

    Bernard: a bunch of use cases would benefit from this

    Peter: I'm energetic on it

    richard: +1
    … esp if we focus on a transformation API

    HTA: next steps would be writing up an explainer with use
    cases, and a proposed API shape

    <rlb> happy to help, if youenn is willing to drive :)

   Action items & next steps

    HTA: we had seen some serious architecutre discussions on
    encoded media - I'll take the action item to push that forward
    … Elad is on the hook for capture handle
    … and we have 3 signed up volunteers for packetization

    Bernard: we had good discussion on use cases we want to enable

    JIB: we also closed almost of the simulcast issues

    Elad: I'm looking into a proposal for an element capture API to
    generate a mediastreamtrack without occluded content - it has
    security issues that will need to look into
    … this will be discussed at a breakout session tomorrow at 3pm

    HTA: we also have a joint meeting with Media WG on Thursday -
    we'll discuss metadata for video frames there


Summary of resolutions

     1. [62]go with proposal 3 without returning a timestamp
     2. [63]setFocusBehavior as a hint with unspecified default
        applicable to tabs & windows


  Web Real-Time Communications Working Group and Media Working Group and
          Media and Entertainment Interest Group - Joint meeting

16 September 2022

    [2]IRC log.

       [2] https://www.w3.org/2022/09/15-mediawg-irc


           Alex_Turner, Bernard_Aboba, Bradley_Needham,
           Brandon_Jones, Brian_Baldino, Chao_He, Chris_Needham,
           Cullen, Daisuke_Kodajima, Dale_Curtis, Dan_Sanders,
           David_Singer, Dominique_Hazael_Massieux, Eero_Haekkinen,
           Eric Carlson, Eric_Carlson, Eric_Portis, Eugene_Zemtsov,
           Francois_Daust, Gary_Katsevman, Harald_Alvestrand,
           Hiroshi_Fujisawa, Hiroshi_Kajihata, Hisayuki_Ohmata,
           Jake_Holland, Jan-Ivar_Bruaroey, Jennings,
           Johannes_Kron, Kaz_Ashimura, Kenaku_Komatsu,
           Klaus_Weidner, Masaki_Matsushita, Mike_English,
           Paul_Adenot, Peter_Thatcher, Piotr_Bialecki, Riju,
           Sung_young_Son, Tatsuya_Igarashi, Thomas_Guilbert,
           Tove_Petersson, Tuukka_Toivonen, Youenn_Fablet



           cpn, dom


     1. [3]Introduction
     2. [4]Issue 90/Issue 121: Pluggable codecs
     3. [5]Issue 131: Packetization API
     4. [6]Issue 99 & Issue 141: WebCodecs & WebRTC
     5. [7]Issue 70: WebCodecs & MediaStream transform
     6. [8]WebCodecs
          1. [9]w3c/webcodecs #198 Emit metadata (SPS,VUI,SEI,...)
             during decoding
          2. [10]w3c/webcodecs #371 AudioEncoderConfig.latencyMode
             (or similar) extension
          3. [11]w3c/webcodecs #270 Support per-frame QP
             configuration by VideoEncoder extension
     7. [12]Wrap up
     8. [13]Summary of action items

Meeting minutes

    Slideset: [14]https://lists.w3.org/Archives/Public/www-archive/


    <alwu> does anyone know what the passcode is for the webRTC
    joint meeting? thanks.


    [15][Slide 5]


    <eric-carlson_> +present

    Bernard: some context for this joint meeting

    [16][Slide 6]


    Bernard: the pandemic has brought a new wave a technology to
    the mass market
    … podcasting (75% of China once/week), video conferencing,
    video streaming (vod, live streaming)
    … ...
    … a bunch of these are being compiled in the webrtc-nv use
    cases and webtransport
    … they blur the lines between streaming and realtime

    [17][Slide 7]


    [18][Slide 8]


    [19][Slide 9]


    [20][Slide 10]


    [21][Slide 11]


    Bernard: lots of SdOs involved, and even lots of different
    groups involved on these topics in W3C itself
    … as a developer, you have to combine all these APIs together
    to build an app
    … the webcodecs sample code illustrate that as they need to use
    many different of APIs

    [22][Slide 12]


    [23][Slide 13]


    Bernard: the webrtc worked with WHATWG to make the WHATWG
    streams work better for media use cases
    … converting video frames from WebCodecs into other formats is
    another need that has emerged
    … the support for worker thread for the pipeline is still
    … separating worker threads for send and receive pipelines is
    not easy to put in place in practice

    [24][Slide 14]


    Bernard: goal is to identify owners of the identified problems
    and next steps

    [25][Slide 15]


    Kaz: Web of Things are thinking of using WebRTC for use in
    video surveillance cameras
    … I wonder about security considerations for these types of

    Youenn: does this need DRM?

    Kaz: not necessarily, but the cameras would still want to
    protect their data

    youenn: WebRTC provides hop-by-hop encryption
    … complemented by end-to-end encryption done in the encoded
    … that may be something to consider here as well

    bernard: please file an issue for use cases

    [26]WebRTC NV Use Cases Repo

      [26] https://github.com/w3c/webrtc-nv-use-cases/issues

    fluffy: it would be important to have support for end-to-end
    encryption - beyond JS-based encryption, it would be good to
    re-use our existing infrastructure (certificates management,
    webauthn) to enable E2EE
    … probably anchored into MLS

    Bernard: would that be separate from WebRTC then?

    Cullen: yes
    … we have a much better understanding of what's needed
    … we need to make sure our architecture exposes API in the
    pipeline where the browser doesn't necessarily have access to
    the underlying data

    Bernard: there is a related issue in the webcodecs repo, filed
    in the DRM context

    Dom: Code samples is a good way to identify where the gaps are

    ACTION: Set up a place where we can discuss cross-group or
    cross-API architecture issues

   Issue 90/Issue 121: Pluggable codecs

    [27]Adopt feedback streams in RTCRtpScriptTransformer #90

      [27] https://github.com/w3c/webrtc-encoded-transform/issues/90

    [28]Need APIs for using WebCodec with PeerConnection #121

      [28] https://github.com/w3c/webrtc-encoded-transform/issues/121

    [29][Slide 16]


    Harald: once you take a media and you transform it, it no
    longer has the same format, which is problematic for SDP-based
    … this creates issues where the pipeline is configured for the
    wrong media
    … one property that we need from the communications machine for
    webrtc is the ability to speak truth that the content being
    sent doesn't conform to the spec
    … there is also an issue around packetization/depacketization
    … we should figure an API so that once you install a
    transformation in the pipeline how to handle the data in the
    communications machinery
    … that should operate in terms of codec description

    Bernard: this is key to enabling to combine webcodecs & webrtc
    … also needed for E2EE

    Harald: this problem has indeed arisen already in the context
    of E2EE
    … also, should this be done in terms of packets or in terms of

    Bernard: in terms of next steps, the idea is to sollicitate

    Youenn: re WebCodecs, the traditional view is that if you're
    using webcodecs you will use web transport or datachannel to
    transport the data
    … what's the use case for using webcodecs inside something very
    integrated like peerconnection
    … it's good to provide proposals, but I would focus on use
    cases and requirements

    Bernard: the key reason is that niether datachannels nor
    webtransport will generate low latency
    … getting to 100ms can only be achieved with WebRTC

    Youenn: I agree with that
    … one of the main benefits of WebCodecs is for the use of
    hardware encoder/decoder
    … when you expose that in WebCodecs, we should expose it in PC
    … the main benefits would be to finetune encoders/decoders
    … not clear what the benefits would be vs native
    encoder/decoder integration in PC

    Harald: in a browser, the code supporting the codecs for the 2
    are the same
    … but the capabilities in the 2 APIs aren't
    … the most intensive control surface would be in WebCodecs
    rather than in WebRTC
    … you control the interface via WebCodecs with an input/output
    in RTP
    … this requires supporting the integration

    Bernard: WebCodecs hasn't been for long, but they're already
    exposing a lot more capabilities than WebRTC

    Peter: +1 to Youenn that the same codecs are exposed in both
    WebRTC & WebCodecs
    … but as we add new low level controls - would we want to add
    them in both places or in a single place?

    jib: we have multiple APIs that are starting to overlap in
    … what might help would be to ask what we prefer this or that
    … I'm hoping we can avoid "this API is better than that API"
    … what makes WebCodecs better? if it's more codec support, we
    can add more codecs to WebRTC
    … or for WHATWG stream support in WebTransport, that could be
    exposed in data channels
    … it may not be necessary to break open WebRTC to benefits from
    the RTP transport benefits

    eugene: it's been mentioned that RTCDataChannel and
    WebTransport are not truly optimized for RTC
    … can something be done about it?
    … A/V is nto the only application for real-time transport

    Bernard: this is under discussion in a number of IETF WGs
    … some things can be done but it's still a research problem
    … in the WebTransport WG, we discussed interactions with the
    congestion control algorithms
    … not sure how much momentum there would be to change
    congestion control for datachannels

    Harald: datachannels are built on SCTP
    … it's now easier to play with the congestion control for that
    … understanding congestion control is hard, improving it even
    … unlikely to be solved in the short term, in IETF

    fluffy: in terms of congestion control, ReNo CC can be moved to
    … in terms of driving interest on WebCodecs, the next
    generation of codecs driven by AI are providing substantial
    improvements to this space
    … these are proprietary codecs that some would like to deploy
    at scale in browsers

    Bernard: a lot of these technologies aren't that easy to do in
    … e.G. AI integration
    … pre/post-processing can be done with webcodecs

    peter: re congestion control, there are something we could do
    at the W3C level by exposing more information about what's
    happening in the transport so that app can adjust what they
    … I've volunteered to help with that
    … on the codecs side, re which API to use - even if we allow
    "bring your own codec" and an RTP Transport, there is still a
    question of the jitter buffer which is uniquely exposed in
    WebRTC (not in WebTransport)

    CPN: additional control surface in WebCodecs?

    Paul: we can add knobs in WebCodecs when they apply to
    … otherwise, we add knobs in struct in the registry
    … e.g. tuning for drawing vs real-life for video
    … or packet size for audio

    CPN: we welcome issues for additional control surface in the
    webcodecs repo

    youenn: one of the things we discussed was asking higher
    quality in the area of the video whtere there is e.g. a face

   Issue 131: Packetization API

    [30][Slide 17]


    [31]Should we expose packetization level API to
    RTCRtpScriptTransform? #131

      [31] https://github.com/w3c/webrtc-encoded-transform/issues/131

    Youenn: rtcpeerconnection used to be a blackbox with an input
    and output
    … webrtc-encoded-transform opens it up a little bit
    … it works for some things
    … but it doesn't cover all use cases
    … we tried to identify use cases where there isn't good support
    … e.G. when adding redundancy for audio to make it more
    reliable over transport
    … but adding redundancy expose the risk of having packets being
    … Giving more control on the generation packets for video frame
    would also help in the context of encryption (SPacket)
    … RTP packets come with RTP headers that aren't exposed
    directly - it could be nice to give r/w access to apps e.g. to
    improve the voice-activity header
    … I plan to gather use cases and requirements since there is
    some interest - hope others will join me

    cpn: where would that be done?

    youenn: issue #131 is probably the right place

    fluffy: it's hard to know how to find where the discussions
    happen; I'm interested

    bernard: this relates to the challenge of keeping track of work
    happening across all these groups

    fluffy: the architectural discussion is whether packetization
    is part of the pipeline we need to consider in our cross-group

    <englishm> fluffy++

    cpn: we need to seed our x-greoup repo with a description of
    where we are and where we want to go

    bernard: another possibility would be a workshop to help with
    these x-groups discussions

    jib: I played with the API which illustrated that packetization
    is both codec and transport specific
    … is our quesiton "where should it belong architecturally?"

    bernard: great question

    jake: has anyone talked with IETF on the need to surface the
    MTU to the encoder?

    youenn: the packetizer has the info on MTU
    … the question is whether the packetizer would give the info
    back to the app

    ACTION: Create initial architecture description

   Issue 99 & Issue 141: WebCodecs & WebRTC

    [32][Slide 18]


    [33]Encoded frame IDLs share definitions with WebCodecs #99

      [33] https://github.com/w3c/webrtc-encoded-transform/issues/99

    [34]Relationship to WebCodecs #141

      [34] https://github.com/w3c/webrtc-encoded-transform/issues/141

    Youenn: webrtc-encoded-transform and webcodecs share some very
    similar structures, but with differences incl on mutability of
    some of the data
    … it's a little bit inconvenient we have these different models
    … but that ship has probably sailed
    … for metadata, we're thinking of referring back to webcodecs
    … this got support from the webcodecs involved in the WebRTC WG
    meeting on Monday

   Issue 70: WebCodecs & MediaStream transform

    [35]Issue 70: WebCodecs & MediaStream transform

      [35] https://github.com/w3c/mediacapture-extensions/issues/70

    [36][Slide 19]


    Youenn: media capture transform allows to grab frames, modify
    them, and repackage them as a mediastreamtrack
    … this is based on the VideoFrame object
    … cameras nowadays are capable to expose information about e.g.
    face positions
    … it's cheap to compute and could be usefully exposed to Web
    … since this is tightly synchronized data, it would be good to
    package it with the video frames
    … in our dsicussion on Monday, we agreed to have a metadata
    dictionary in VideoFrame
    … WebRTC would expose its set of metadata for e.G. face
    … we discussed the possibility to have webapp specific metadata
    e.g. through JS objects
    … this could be useful in the context of AR/VR
    … the initial PR won't go that far, but it should open the way
    to that direction
    … the first use case will focus on browser-defined metadata
    … these would be exposed as the top-level, and there would be a
    .user sub-object that would need to be serializable

    Bernard: this has application beyond the use cases on the slide
    … e.g. face detection could help encoders around faces vs
    blurred background
    … we would need to consider whether the encoder should be
    expected to look at that metadata

    youenn: +1

    cpn: how common is face detection in the camera level?

    eric: it's very common

    youenn: we could also expose a face detection blackbox that
    would transform a camera feed and annotate it with face
    detection metaata

    Ada: (Immersive Web WG co-chair)
    … some of of our WG participants are very intereesting in
    bringing more AR feature to WebRTC
    … they do AR by running WASM on the camera feed from WebRTC
    … if you were to fire up the various AR systems on the devices,
    you could surface more metadata e.G. the current position of
    the device or solid objects in the environments

    Youenn: very important we get the AR/VR input on this work
    … we're also thinking of exposing requestVideoFrame to expose
    some of these metadata as well

    [37]WebCodec as pass through to application metadata #189

      [37] https://github.com/w3c/webcodecs/issues/189

    bialpio: also from the IWWG
    … we also have a use case to integrate WebXR animation loop
    with video frames used in XR session
    … how would we correlate a video frame from gUM with poses?

    youenn: in WebRTC, accessing frames from the camera is via a
    track processor in a worker
    … correlating that with timestamps might work
    … would be interesting to see if this is workable with

    bialpio: in this particular use case, WebXR would be
    introducing video frames, they could be added as a metatadata
    into video frames
    … some devices are doing pose prediction - this might make it
    tricky to sync with past video frames
    … the question is what we would be the best place to discuss

    youenn: VideoFrame sounds like the place where this is
    … so webcodecs repo in the Media WG would seem good

    cpn: also worth discussing in the architectural repo

    ACTION: Include image capture for AR applications and stream
    correlation in architecture description

    kaz: spatial data wg also looking at sync between location and
    … useful to consider sync with video stream as well

    Bernard: the synchronization issue has come up several time
    … the videotrackprocess gives you videoframe with timestamps
    … how to render that accurately in sync with audio
    … it's not always clear that you get the sync that you want
    … is that the appropriate API when dealing with all these
    … what's the right way to render sync'd audio/video?
    … this is probably worth some sample code to figure it out

    fluffy: one of the architectural issues is that these various
    systems are working at different timing with different control
    … how to synchronize them and render them correctly is an
    architectural issue - it needs the big picture of how it fits

    Bernard: half of the questions I get is in that space

    ACTION: capture synchronization issues in the architecture

    paul: this metadata seems to become a central piece

    <riju_> Thanks Youenn for bringing this up.. Others if there's
    something specific to FaceDetection you want to contribute/ask
    here's an explainer [38]https://github.com/riju/faceDetection/

      [38] https://github.com/riju/faceDetection/blob/main/explainer.md

    paul: it seems worth focus on this to save trouble down the

    [39][Slide 20]



    <jholland> apologies, I have to leave early. Thanks for an
    informative session.

     w3c/webcodecs #198 Emit metadata (SPS,VUI,SEI,...) during decoding

    [40]Emit metadata (SPS,VUI,SEI,...) during decoding #198

      [40] https://github.com/w3c/webcodecs/issues/198

    DanS: with our metadata proposal, the way to do it is
    relatively clear, the question is really platform support

     w3c/webcodecs #371 AudioEncoderConfig.latencyMode (or similar)

    [41]AudioEncoderConfig.latencyMode (or similar) #371

      [41] https://github.com/w3c/webcodecs/issues/371

    [42]Fixed audio chunk size support #405

      [42] https://github.com/w3c/webcodecs/issues/405

    tguilbert: these 2 issues may be the same issue

    paul: indeed, probably overlap
    … it's mostly something we need to do
    … in terms of codecs it will apply to: most codecs are
    fixed-size frame, but OPUS, @@@ and FLAC (not adapted to
    … adding it to OPUS registry would work

    Bernard: 405 is adding ptime - doesn't need to be codec

    <Bernard> Issue 405 is about ptime... should be for all codecs,

    tguilbert: if we expose latency mode on audio encoder, the
    meaning may differ across codecs

    <Bernard> Bernard: Question about "latency mode": for video,
    the knob doesn't seem to make much difference in latency...

    tguilbert: so a per-codec setting might be better than a
    generic low-latency/high-quality toggle

     w3c/webcodecs #270 Support per-frame QP configuration by
     VideoEncoder extension

    [43]Support per-frame QP configuration by VideoEncoder #270

      [43] https://github.com/w3c/webcodecs/issues/270

    eugene: QP tends to be codec specific
    … the advice we're getting from people working with codecs is
    to not try to use a common denominator approach
    … they want finegrained controls to tune this
    … so we're working towards a codec specific approach via
    settings in the registry

    Bernard: the cool thing about per-frame QP is that it can have
    an impact on congestion control
    … re latency mode - I've been playing with it on the video
    side, when I set it to "quality", it doesn't seem to make any
    difference in latency
    … and it generates fewer keyframes, which improve the transport
    … is that intended?

    eugene: this is implementation specific, it's hard to tell
    without more details on the encoder
    … the API is not trying to be creative, it reflects the knobs
    from the encoder
    … please send me links to your code sample and I'll take a look
    … it's not the intended behavior

   Wrap up

    CPN: who are the relevant WGs we need to coordinate with?

    Bernard: I had missed the XR folks in my list

    Dom: For the architecure we need the WGs more than IGs
    … For the media pipeline, not sure WHATWG stream is needed to
    be involved in the architecture

    Bernard: WHATWG streams are a black box, difficult to see how
    much latency is contributed

    Dom: As we go through this and collaborate on code samples,
    some issues may need to be surfaced to WHATWG streams
    … Who do we need at the table to design the pipeline?
    … Suggest doing it iteratively, then convince people to
    contribute to it
    … Let's start with those already here, then when we find pieces
    we don't understand in depths, reach out to the relevant groups
    … Workshop with several components. It's an interesting idea to
    run a workshop. Once we have an initial architecture, we'll
    know who to invite
    … So suggest starting small and iterate. Once we get stuck,
    extend the conversation
    … The real question is to find people committed to look at it.
    May be harder to find than people willing to work on the
    … Who'd be willing to drive it?

    Bernard: I'm one person, but don't know all the pieces

    Peter: I volunteer, for the parts I know about

    Bernard: Would be helpful to have sample code that illustrate
    problem points
    … We have some in WebCodecs, spend time developing samples.
    Particularly encourage the AR/VR people to try out WebCodecs
    … What kinds of sample could would be illustrative?

    Dom: I think we should take the use cases and identify
    integration points. Some may not be possible, and what would it
    take to make them possible would be an outcome?

    ChrisN: Where to host the repo?

    Dom: Something like media-pipeline-architecture under w3c

    Bernard: IETF have hackathons, could that be a useful thing?
    Does w3c do that?

    Dom: If we have sponsors and a time and place, it's possible. I
    have less experience with virtual hackathons
    … Could be investigated as part of the workshop idea

    Cullen: An IETF hackathon could be welcome. Agree with Dom,
    virtual hackathons haven't been so successful compared to
    having people in the room
    … Next IETF is London in November. They'd be happy if we showed
    up to hack there

    <dom> kaz: Web of things wg has been holding plugfests, with
    SoftEther VPN to help separate remote collaboration

    <dom> CPN: we've done a few ad-hob joint meetings between our 2

    <dom> ... when should we plan our next?

    Dom: How long would it need to do some initial work to bring to
    that meeting?

    Bernard: Depends on the topic. A month or two for sample code.

    Dom: So perhaps end of October

    Bernard: Aligns with IETF

    <Bernard> Bernard Aboba: Present

    <kaz> [adjourned]

Summary of action items

     1. [44]Set up a place where we can discuss cross-group or
        cross-API architecture issues
     2. [45]Create initial architecture description
     3. [46]Include image capture for AR applications and stream
        correlation in architecture description
     4. [47]capture synchronization issues in the architecture

Received on Monday, 19 September 2022 13:10:11 UTC