[minutes] Jan 16 2024 meeting

Hi,

Based on Henrik's notes of the meeting last week [1], I generated the 
usual W3C style minutes of the Jan 16 meeting at:
   https://www.w3.org/2024/01/16-webrtc-minutes.html

Dom

1. https://lists.w3.org/Archives/Public/public-webrtc/2024Jan/0044.html


                       WebRTC January 2024 Meeting

16 January 2024

    [2]Agenda.

       [2] https://www.w3.org/2011/04/webrtc/wiki/January_16_2024

Attendees

    Present
           Bernard, Fippo, Florent, Guido, Harald, Henrik,
           Jan-Ivar, Tony, Youenn

    Regrets
           Dom

    Chair
           Bernard, Harald, Jan-Ivar

    Scribe
           Henrik, scribe

Contents

     1. [3]WG Document Status
          1. [4]WEBRTC-PC:
          2. [5]Mediacapture-Streams
          3. [6]MST-ContentHint
          4. [7]WebRTC-SVC
          5. [8]Encoded Transform
          6. [9]MediaCapture Transform
     2. [10]BLOCKING ISSUES
          1. [11]setCodecPreferences vs unidirectional codecs
          2. [12]WebRTC spec should explicitly specify all causes
             of a PC sourced track being muted
          3. [13]General approach to capabilities negotiation
          4. [14]Align exposing scalabilityMode with WebRTC
             “hardware capabilities” check
          5. [15]How does generator.mute change track stats?
          6. [16]Is RTCEncodedVideoFrameMetadata.frame_id actually
             an unsigned long long or does it wrap at 16 bits?
          7. [17]Mark resizeMode, sampleRate and latency as feature
             at risk
          8. [18]Highly detailed text in video content
          9. [19]Comments and request from APA review
     3. [20]WebRTC-Extensions: API to control encode complexity
     4. [21]Summary of resolutions

Meeting minutes

    Slideset: [22]https://lists.w3.org/Archives/Public/www-archive/
    2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf

      [22] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf

   WG Document Status

    Bernard: What’s going on with these specs and how can we make
    progress. Issues not advancing is a red flag.

     WEBRTC-PC:

    [23][Slide 11]

      [23] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=11

    Bernard: How often should we recycle this?

    Harald: We have extension specs too that should go from
    candidate recommendation to recommendation. It would make sense
    to recycle about once per year.

    [24][Slide 12]

      [24] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=12

    Bernard: 50 open issues, 20 open > 1 year. We’re not in great
    shape for recycling each year.

     Mediacapture-Streams

    [25][Slide 13]

      [25] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=13

    [26][Slide 14]

      [26] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=14

    Bernard: This is unusual in that it is widely implemented but
    only candidate recommendation. 31 open issues, 9 open > 1 year.
    It doesn’t look like we’re on the road to proposed
    recommendation.

    [27][Slide 15]

      [27] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=15

    Bernard: Are there issues with the WPT tests, surprising
    failures.

    Jan-Ivar: There are some issues with testing device
    infrastructure.

    Harald: Is the transferable track errors a sign of feature not
    being implemented or a difficulty with testability?

    Jan-Ivar/Youenn: We have not implemented this yet.

    Youenn: Maybe we should better organize the specs, the
    transferable track is an extension.

    RESOLUTION: We should move mediacapture extension tests.

     MST-ContentHint

    [28][Slide 16]

      [28] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=16

    [29][Slide 17]

      [29] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=17

    Bernard: This being a working draft seems to be in sync, we
    should push to advance. It’s not a huge list to push to CR.

     WebRTC-SVC

    [30][Slide 18]

      [30] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=18

    [31][Slide 20]

      [31] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=20

    Bernard: This is also working draft but it has been implemented
    in Chromium. Safari Tech Preview indicate support but it’s not
    passing the WPT tests. Is there a Sonoma dependency?

    Youenn: I need to look at these tests.

     Encoded Transform

    [32][Slide 21]

      [32] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=21

    [33][Slide 22]

      [33] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=22

    Bernard: This is a working group draft but the test is odd in
    that you have some passes for Firefox and Safari but very
    little on Edge and Chrome. This is a little bit worrisome, is
    this an issue with the spec or implementations?

    Youenn: Chrome is passing 23/27 in the tentative folder, the
    test in the top folder are following the spec, Firefox/Safari
    is implementing ScriptTransform but not SFrameTransform, so
    that could be a feature at risk.

    Harald: This is a spec problem, we don’t have agreement on a
    couple of key features of the spec, so the tests in tentative
    reflect the state of implementation before we come to
    agreement.

    Bernard: So this one seems to have some legitimate issues
    keeping it back. But they won’t go away if we don’t talk about
    them.

     MediaCapture Transform

    [34][Slide 23]

      [34] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=23

    [35][Slide 24]

      [35] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=24

    [36][Slide 25]

      [36] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=25

    Bernard: Not much has been happening since October 2022.
    Chromium and Sfari Test Preview implements it. 18 open issues,
    17 open > 1 year.

    Youenn: It’s partially implemented, but like with
    ScriptTransform chromium implements a previous version, so this
    could be a similar status. There are two API versions in
    different browsers.

    Bernard: But the functionality is the same, it’s only in the
    API shape?

    Youenn: Yes.

    Harald: The key issue is availability on main thread, and we
    have a problem with transferability. Transferring
    MediaStreamTrack is not implemented by any browser.

    Bernard: So the media stream transform relates to the
    implementability of this spec.

    Guido: It’s similar but it’s not quite the same as encoded
    transform in the sense that chromium is proposing mostly a
    superset of what the current spec says, which is basically
    availability on window and support on audio. There is one small
    difference in API shape which is MediaTrackGenerator, it’s very
    similar to the generator that we have in the older version, and
    we could very easily make a version that is compatible with the
    spec. But the main thing blocking is the transferability of the
    track which nobody has implemented.

    Youenn: We have a prototype that will probably be available in
    Safari Tech Preview in the coming months. Just video, not
    audio. It’s not enabled by default or complete yet.

    Bernard: In summary, good news: widely implemented specs, but
    the specs are lagging behind implementations. It doesn’t seem
    like a huge task. But then there are the transforms where there
    are real spec issues. So in the next couple of meetings we
    should try to make progress on these blocking issues.

   BLOCKING ISSUES

     [37]setCodecPreferences vs unidirectional codecs

      [37] https://github.com/w3c/webrtc-pc/issues/2888

    [38][Slide 30]

      [38] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=30

    Fippo: setCodecPreferences does not directly affect the send
    codec but webrtc-pc looks at both send and recv. We could
    either…
    … Fix webrtc-pc by removing mentions of send codecs.
    … Clarify codecs match algorithm.
    … Do we agree that we should remove send codec?

    [Thumbs up from several people.]

    Harald: I’m doing a deep dive in a follow-up issue and would
    like to get input.

     [39]WebRTC spec should explicitly specify all causes of a PC sourced
     track being muted

      [39] https://github.com/w3c/webrtc-pc/issues/2915

    Jan-Ivar: We’ve discussed mute in a lot of specs, there still
    seems to be some doubt about what mute means in webrtc-pc
    (remote tracks). Part of the problem is that the
    mediastream-main has two jobs, which is defining what
    MediaStreamTrack is and to define device capture. But in short,
    the mute event is controlled by the user agent and is
    controlled by the source. But for WebRTC, the source is an
    incoming track.
    … WebRTC-PC defines its own set of mute/unmute steps, but there
    is lack of clarity if what mediacapture-main says about muting
    still applies here or not which is more specific to camera and
    microphone, so the question is, does that still apply here?
    … The way I read the spec, the definition of WebRTC-PC is the
    full description replacing mediacapture-main.

    Harald: I think there are situation where it is natural to mute
    that is not listed, for example if the network is disconnected.
    Or the state of the peer connection goes to disconnected. It
    would seem reasonable to mute the tracks.

    Youenn: We should get consensus on when mute should happen. I
    would try to get consensus on why we mute and we should list
    that.

    Jan-Ivar: My proposal is that webrtc-pc listed all the reasons.
    It should list all reasons. We did this for constraints.

    Henrik: I think we should separate the question of where we
    define mute reasons, from the question of if all mute reasons
    are listed. I agree with Harald that we should mute if the pc
    disconnects, but I think webrtc-pc should say this, not
    mediacapture-main.

    Youenn: I agree and we can add reasons in a PR.

    Jan-Ivar: We should focus on what has already been implemented.

    Harald: I think we have consensus that for any case where we
    agree that browsers should mute, like the BYE, that should be
    in the spec. I don’t think we have consensus if it is up to the
    user agent to mute at other times.

    Jan-Ivar: When mute and unmute happens in different specs could
    overlap and cause races.

    Youenn: Maybe WebRTC-PC can remain open-ended, so that they are
    not open-ended, hopefully nobody is implementing mute for
    canvas capture and it would be good that if the spec said so.
    Then we could follow up that discussion.

    Jan-Ivar: I think the reason for mediacapture-main’s mute being
    open-ended was for privacy reasons which may not apply to other
    specs. Hopefully other specs don’t need open-endedness so that
    we can get implementations to converge.

    Youenn: If would be good if we can get a list of reasons why
    Chromium might mute.

     [40]General approach to capabilities negotiation

      [40] https://github.com/w3c/media-capabilities/issues/176

    [41][Slide 32]

      [41] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=32

    Bernard: MediaCapabilities indicate “supported”,
    “powerEfficient” and “smooth”.
    … PING did a review in March 2021. They liked the
    fingerprinting analysis but questioned why we expose device
    capabilities for the purpose of negotiation as opposed to
    having the user agent negotiate based on capabilities and pick
    the one it likes the best. The problem is that this does not
    work with the RTC media negotiation model, this sounds more
    like a streaming use case model. No progress for years, PING
    wants progress.

    [42][Slide 33]

      [42] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=33

    [Bernard is doing PR [43]#212 (see slide) and wants reviews.]

      [43] https://github.com/w3c/media-capabilities/issues/212

     [44]Align exposing scalabilityMode with WebRTC “hardware
     capabilities” check

      [44] https://github.com/w3c/webrtc-svc/issues/92

    [45][Slide 34]

      [45] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=34

    [46][Slide 35]

      [46] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=35

    Bernard: PING also did a review of scalabilityMode and said it
    would expose additional fingerprinting surface. But it’s not
    enabled in MediaCapabilities.
    … Through trial and error you could set scalabilityMode to
    check which modes are or are not supported, but it does not
    tell if it is hardware or software. Maybe you can figure it out
    via performance in getStats.
    … The bottom line is that in webrtc-svc you only get a subset
    of what is exposed in MediaCapabilities. We also don’t want to
    add a hardware check for MC since it it can be used for
    streaming use cases.

    Henrik: scalabilityMode is a subset of MC, I don’t understand,
    did PING say MC is OK or is it that they haven’t had time to
    object to MC yet? These issues are entangled so I think we need
    to be consistent.

    Florent: We need them to understand that the way RTC works on
    the Internet. How about we invite them and explain the
    situation?

    Jan-Ivar: What’s unique with SDP is that it is exposed to
    JavaScript, so there is no way not to expose this. But if
    MediaCapabilities is not exposed, you could still do a
    suboptimal call, so we need to figure out if that is tenable.
    We could determine the minimum set of codecs that need to be
    exposed, and if those are the same across browsers than it
    wouldn’t say much.

    Harald: I don’t think we should waste time discussing such
    redesign, at least not on this basis. Our current webrtc-pc is
    what it is.

    Bernard: Codecs tend to come in waves, so really only what
    you’re learning is if they have a new device or not, it’s not a
    huge privacy risk.

    Youenn: We don’t have the same analysis, we think it is a real
    issue. As the older devices diminish it will become a very
    important fingerprinting.

    Bernard: I will continue to work on the privacy analysis.

     [47]How does generator.mute change track stats?

      [47] https://github.com/w3c/mediacapture-transform/issues/81

    [48][Slide 36]

      [48] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=36

    Bernard: What happens when you mute with the generator
    attribute. One option is that you fire the event, or you could
    queue a task to fire mute on all of the clones.

    Proposal: let’s go with the second option.

    RESOLUTION: Let’s go with second option

     [49]Is RTCEncodedVideoFrameMetadata.frame_id actually an unsigned
     long long or does it wrap at 16 bits?

      [49] https://github.com/w3c/webrtc-encoded-transform/issues/220

    [50][Slide 37]

      [50] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=37

    Tony: In chromium this is implemented from the dependency
    descriptor which is 16 bit, but unwraps it into a 64 but
    unsigned on the receiver side.
    … Proposing to keep unsigned long long. frameId is a
    monotonically increasing frame counter and its the lower 16
    bits will match the frame_number of the DD header extension.

    Bernard: Do we care about dependency chains? There could be
    circumstances where the dependency is fulfilled. [...] I
    support what this slide is saying, we can talk about chains in
    a separate issue.

    RESOLUTION: Consensus to move forward with Tony’s proposal

     [51]Mark resizeMode, sampleRate and latency as feature at risk

      [51] https://github.com/w3c/mediacapture-main/issues/958

    Jan-Ivar: Some constraints only have one implementation, so the
    proposal is to mark them as feature at risk.

    Guido: I object because reasizeMode is widely used by people
    who use Chromium, some users have even requested additional
    resize modes. The other three Chromium implements them to
    varying degrees, latency is used particularly on Windows for
    users to select capturing with lowest possible capture sizes.
    So if we eventually remove them from the spec then we would not
    be able to remove them from the web because it would break the
    web. Sample size is implemented and exposed by Chromium but I’m
    not aware of any use case.

    Henrik: I think sampleRate relates to another issue where
    people today use SDP munging to change the codec sample rate,
    and a possible outcome of that was if it should use the track’s
    sample rate, but I’m not sure about the status of that.

    Youenn: Maybe we can move these to the mediacapture-exensions
    instead of being marked as feature at risk? Or maybe both. But
    eventually we may remove them from mediacapture-main

    Guido: I think it makes sense for sampleRate, sampleSize and
    latency but for resizeMode I think it is important.

    Jan-Ivar: Is chromium’s default to automatically downscale?
    Guido: Yes. Jan-Ivar: We’re also planning to do this default,
    so I’m curious what the remaining use case is for developers to
    turn this off.

    Guido: Some people want to make sure they get a native
    resolution.

    RESOLUTION: Move sampleSize, sampleRate and latency to the
    extension spec. And then work harder on resizeMode.

     [52]Highly detailed text in video content

      [52] https://github.com/w3c/mst-content-hint/issues/35

    [53][Slide 40]

      [53] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=40

    Harald: We have some text saying that if you use contentHint in
    “text” you activate some flags in AV1. The original PR was
    intended to acknowledge that in some scripts, details matter
    more than others (see examples). If you downscale those fonts
    those would become unreadable earlier than if ASCII be
    unreadable.
    … Bernard also noted that red text on yellow background will
    work worse than black and white if 4:2:0 coding is used and
    recommends 4:4:4. We may not want to mandate that due to extra
    overhead, but we could…
    … Reword addition to note that encoded of colored text may
    cause readability issues
    … Recommend 4:4:4 if colored text dominates when contentHint
    “text” is used.
    … Mandate use of 4:4:4 for this case

    Youenn: I wouldn’t go with mandating, it is a contentHint, so
    I’m all fine with saying “hey user agents please advice…” but
    in terms of mandating I’m not sure we will have wording that is
    always right so I think that’s too far. So 3 is out.

    Bernard: 3 is also out for me, there’s a lot of extra
    bandwidth, and is this even supported? Anyway it’s certainly
    not prevalent so mandating seems too much. Even recommending
    seems pretty high for something like this, it’s almost like
    saying that someone who implements AV1 must implement it.

    Jan-Ivar: I would also gravitate towards the lower numbered
    proposals. In addition there could be an API that is not
    hints-based, for example constraints that specify. I’m
    reluctant to add new functionality that only acts on a hint on
    the track. Perhaps there should be a corresponding API on the
    sink instead. That would rule out 2 and 3.

    Fippo: We do have 4:4:4 support for H264, but I wouldn’t
    recommend it too much, people can codec negotiate all they
    want, I’d go for 1.

    RESOLUTION: Consensus on proposal 1 (note, not recommend).

     [54]Comments and request from APA review

      [54] https://github.com/w3c/mst-content-hint/issues/55

    [55][Slide 41]

      [55] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=41

    Harald: The APA has reviewed contentHint. See slides for issues
    that we may or may not need to address.
    … We don’t do links between things at higher level. I suggest
    saying that in the track model a track is a track is a track.
    Things that link tracks together need to be specified on a
    higher level. We should not have regions on videos. I think in
    general we should reject.

    Bernard: I think in general this is not a problem for the MST
    contentHint spec, but there are things in the media capture
    working group worth discussing. There may be some regulation
    that applies to some of this.

    Harald: But these things can be addressed on a higher layer,
    for example I just turned on CC so it is possible to have a
    separate track with subtitles.

    Bernard: I’m just worried that APA gets ignored like PING,
    maybe we should have a joint meeting.

    Harald: We could do this at TPAC if we get the right people
    into the same room.

    Harald to draft a reply.

   [56]WebRTC-Extensions: API to control encode complexity

      [56] https://github.com/w3c/webrtc-extensions/issues/191

    [57][Slide 45]

      [57] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=45

    Florent: We want to be able to optimize the tradeoff between
    device resource usage and compression efficiency for different
    use cases. Affecting CPU, video bitrate and quality.
    … We looked at similar APIs, in Android Media it’s a 0-9
    integer, on Azure Media Services it’s “speed, balanced,
    quality”, in x264 (an H264 library) there is a wide range of
    presets from ultrafast to veryslow.
    … The actual results could vary depending on the codec or
    specific encoder used and are not meant to be fixed by the
    specification. But we expect on average encode time and QP is
    affected as per slide depending on a low, normal or high
    complexity mode.

    [58][Slide 46]

      [58] 
https://lists.w3.org/Archives/Public/www-archive/2024Jan/att-0002/WEBRTCWG-2024-01-16.pdf#page=46

    Youenn: Sometimes compressing more is better for battery due to
    using less battery, how does an app decide what is a good
    decision and does web applications already have this
    information or is it not available? Have you considered
    specifying if you prefer battery or quality as an alternative
    API shape?

    Florent: There are different options that could be discussed. I
    think it is important to have a per stream setting, for example
    the presentation is more important than the face.

    Youenn: I wonder how the page could know which setting to use.

    Florent: It’s more about knowing what the stream is used for.
    Driven by use case. For example thumbnail is less important.
    But the app could also monitor encode time in getStats and use
    that to decide.

    Jan-Ivar: I was going to say we’re going from user agent which
    has a lot of information to the web application which has less
    information so it could make it worse, but you make a good case
    that it can know that one stream is more important than another
    stream. So I just have a bikeshed question on the naming. But
    I’m also concerned if a web app just asks for high quality
    across the board.

    Florent: I’m not opposed of changing the names. This is more to
    say if this is more important or less important. This is mostly
    about CPU time allocated to encode. It’s very common in other
    APIs.

    Jan-Ivar: Would medium or middle mean that user agent decide?

    Florent: Yes the user agent decides, but you can also have the
    web application tell the user agent to

    Jan-Ivar: It might be better to have the default be unset.

    Fippo: Should this also exist for audio? It sounds a lot like a
    setting that exists in opus where there is a value between 1
    and 10.

    Florent: I’m not opposed, but it would be per browser and codec
    dependent, so it’s more like a hint to the browser. But there
    is nothing preventing us from doing this for audio as well.

    Youenn: The user agent is still doing degradation and
    adaptation, so this sounds more like a priority between streams
    rather than CPU or QP.

    Florent: But if you use less time that would affect the QP.

    Henrik: I don’t think this is just about priority between
    streams - it’s that too, but I think even for a single stream
    you could have one use case where you only care about bitrate
    but another use case where it’s all about quality. Right?

    Florent: Yes.

    Bernard: What about upper or lower bounds? Is there a limit?
    Can it affect jitter buffer?

    Florent: It’s still WebRTC deciding, it’s up to the user agent
    that the impact is minimal.

    Harald: The control knob should be specified on encode, not
    priority, we already have priority APIs. I don’t much care
    about the name but it needs to be specific to encoding.

    Bernard: Do we have consensus we want to go ahead with this?

    Youenn: I think it’s worth exploring.

    [Fippo gives thumbs up.]

    [Florent will come up with a PR so we can iterate]

Summary of resolutions

     1. [59]We should move mediacapture extension tests.
     2. [60]Let’s go with second option
     3. [61]Consensus to move forward with Tony’s proposal
     4. [62]Move sampleSize, sampleRate and latency to the
        extension spec. And then work harder on resizeMode.
     5. [63]Consensus on proposal 1 (note, not recommend).

Received on Friday, 26 January 2024 14:15:26 UTC