[Minutes] Media WG Call - 2022-12-13

Hi all,

The minutes (and slides) of this week's Media WG call are available at:
  https://www.w3.org/2022/12/13-mediawg-minutes.html

  ... and copied as raw text below.

Thanks,
Francois

-----
Media WG Teleconference - 2022-12-13
13 December 2022

    [2]Agenda. [3]IRC log.

       [2] 
https://github.com/w3c/media-wg/blob/main/meetings/2022-12-13-Media_Working_Group_Teleconference-agenda.md#agenda
       [3] https://www.w3.org/2022/12/13-mediawg-irc

Attendees

    Present
           Alastor Wu, Bernard Aboba, Chris Needham, Dale Curtis,
           Eric Carlson, Francois Daoust, Frank Liberato, Harald
           Alverstrand, Jer Noble, Matt Wolenetz, Peter Thatcher,
           Sushanth Rajasankar, Youenn Fablet

    Regrets
           -

    Chair
           -

    Scribe
           cpn, tidoust

Contents

     1. [4]ITU-T SG16 Liaison statement on WebCodecs
     2. [5]WebKit update on Audio focus/audio session API
     3. [6]Consistent SVC metadata between WebCodecs and Encoded
        Transform API
     4. [7]Media Pipeline architecture - Media WG input and WebRTC
        collaboration planning

Meeting minutes

   ITU-T SG16 Liaison statement on WebCodecs

    cpn: We received an incoming liaison statement from ITU-T SG16.

    [8]https://github.com/w3c/media-wg/blob/main/liaisons/
    2022-10-28-itu-t-sg16.md <- Draft reply

       [8] 
https://github.com/w3c/media-wg/blob/main/liaisons/2022-10-28-itu-t-sg16.md

    cpn: Around WebCodecs, and also around new VVC codec.
    … I drafted a reply, which describes WebCodecs, the use cases,
    a few indications about our own plans such as current work on
    VideoFrame metadata registry.
    … I shared this. Got a thumbs up from Bernard, Jer, Paul.
    … I want to make sure that everything we write here is
    representative.
    … I was hoping to get this out before the Christmas break.
    … If you haven't had a chance to look at it yet, now would be a
    good time.

    youenn: I like the fact that you state that the group would be
    open to add a registration provided there was support from
    implementors.
    … I assume that means user agent implementors?

    cpn: That's a question for the group perhaps. H.263 comes to
    mind for instance.

    Dale_Curtis: I don't think that we want to be gatekeepers of
    what the registry contains, even though there isn't support in
    web browsers per se.
    … We'd still want some technical constraints to be met.

    cpn: Right. That would apply to any future registration as
    well.

   WebKit update on Audio focus/audio session API

    Slideset: [9]https://lists.w3.org/Archives/Public/www-archive/
    2022Dec/att-0000/AudioSession_API.pdf

       [9] 
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0000/AudioSession_API.pdf

    [10][Slide 2]

      [10] 
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0000/AudioSession_API.pdf#page=2

    Youenn: We received reports that audio handling on iOS isn't
    easy, e.g., VC applications
    … The intent of the application may not match our heuristics
    for setting up the audio pipeline
    … So a new API may be appropriate
    … You might remember the Audio Focus API, initially in Media
    Session, then split out from that
    … There's an explainer, linked from the slides
    … The overall goal is to get feedback, is the scope right, next
    steps?
    … Compared to the original Audio Focus API, we wanted to reduce
    scope, for the iOS platform
    … We focused on the audio session category, and interruptions
    … The API should support future features such as requesting or
    abandoning audio focus
    … Handling audio providers as a group
    … We wrote an explainer, and a prototype in WebKit

    [11][Slide 3]

      [11] 
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0000/AudioSession_API.pdf#page=3

    Youenn: Some examples: setting the audio session category, you
    can open the demo in iOS
    … playAudio and capture functions, for microphone input
    … If you call playAudio initially, then capture, it's
    disruptive in iOS. The reason is that when you play using Web
    Audio, it's ambient
    … Two different audio levels when going from ambient to play &
    record. Something we want to avoid
    … The setCategory function allows you to set the category to
    play & record, don't use ambient

    [12][Slide 4]

      [12] 
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0000/AudioSession_API.pdf#page=4

    Youenn: On interruption, when you're in a video call, you might
    receive a phone call, which is higher priority, and the website
    is interrupted, capture stopped, audio or video elements may be
    stopped
    … But the website may not know that
    … It's also not clear to the website whether to restart audio
    after the phone call
    … Providing the concept of an audio session, which can go
    between active and interrupted, allows the website to change
    what is visible to the user
    … On an interruption, it could show a UI, or UI to allow the
    user to restart capture

    [13][Slide 5]

      [13] 
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0000/AudioSession_API.pdf#page=5

    Youenn: We tried to keep the API small. There's an audio
    session state and audio session type. Then we added an
    AudioSession interface, which we though was clearer
    … Use that to say it's ambient (mix with others), or play &
    record, so the UA can set the audio pipeline accordingly
    … There are event handlers, no constructor. For simple use
    cases, a getter on navigator to get the audio session
    … A default global audio session. Use this object to query or
    tailor it

    [14][Slide 6]

      [14] 
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0000/AudioSession_API.pdf#page=6

    Youenn: My main interest is not to go into specific issues.
    More issues are welcome
    … Question: is this of interest, is it going in the right
    direction? Any thoughts on potential next steps?

    Dale: From a Chrome point of view, Mounir and Becca worked on
    it. At a glance, seems reasonable. There might be worry about
    duplication between Media Session and Audio Session, but no
    specific thoughts on that

    Youenn: The API shape is different, there might be only one
    Media Session in a page, but only one Audio Session
    … The call to split the two things in the past is OK
    … We decided to delay the grabbing and releasing of audio
    focus. There might be other things to consider, e.g., auto play
    … A question I have, is it's not yet submitted in the WG. Is it
    already in scope?

    cpn: Looking at the charter, Audio Focus API is in the list of
    potential normative deliverables
    … We just need to run a call for consensus to adopt the spec to
    the Media WG

    Sushanth: How to handle audio from multiple tabs?

    Youenn: This would help with that

    Sushanth: If the audio type requested by one browser is
    playback, and from another is ambient, only one can exist at a
    time

    Youenn: You'd mimic what two native applications would do. One
    session with playback would probably not be interrupted by
    another that requests ambient

    cpn: At what point would we be ready to run a call for
    consensus on this?

    youenn: If there's already consensus in this call, we'd be
    interested to run it as soon as possible.
    … No particular hurry, but the sooner the better.
    … If there's no consensus, we'd like to know what to work on.

    cpn: Just worried about support from other browser vendors.

    youenn: We talked a bit with Mozilla. I can check with them and
    get back to you.

    alwu: From Mozilla Firefox perspective, that's an API we'd be
    interested in supporting as well.

    Dale: And no reason to hold off calling for consensus while we
    figure things out internally.

    jernoble: In the meantime, feedback on existing issues is
    welcome.

    cpn: So proposed resolution is to run a CfC.

   Consistent SVC metadata between WebCodecs and Encoded Transform API

    Slideset: [15]https://lists.w3.org/Archives/Public/www-archive/
    2022Dec/att-0002/MEDIAWG-12-13-2022.pdf

      [15] 
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf

    [16][Slide 2]

      [16] 
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=2

    [17][Slide 3]

      [17] 
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=3

    [18][Slide 4]

      [18] 
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=4

    [19][Slide 5]

      [19] 
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=5

    Bernard: [going through slides]. Sequence of unsigned long
    dependencies. There's also some missing information.
    … We're essentially re-inventing WebCodecs in another spec,
    perhaps not the right way to go.
    … Two different SVC metadata dictionaries could be avoided.
    … Temporal may be shipping in Safari, but spatial is not
    shipping anywhere.

    Dale: I'm in favor of unifying what we can.

    Bernard: Proposal is for a few of us to get together and
    prepare a PR to harmonize things
    … This would at least avoid future issues.
    … We made some progress in the last couple of days, and Youenn
    prepared a bunch of PRs that solved a number of type
    mismatches.

    cpn: Is this something for the WebCodecs spec itself or the
    metadata registry?

    Bernard: This is for encoded metadata for which we don't have a
    registry.

   Media Pipeline architecture - Media WG input and WebRTC collaboration
   planning

    cpn: Back at TPAC, we identified several places where we may
    benefit from coordination between groups.
    … This is picking up on where we're at with this.

    [20][Slide 6]

      [20] 
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=6

    Bernard: We created a Media Pipeline architecture repo
    following discussions.
    … Issues and pointers to sample code covering integration of
    next generation web media apis.
    … Also to go beyond just the specs we mentioned already, e.g.
    WebTransport which could be used to transport media.
    … From time to time, it's hard to undertand whether there are
    performance issues in the specs, implementations or in the code
    sample.

    [21][Slide 7]

      [21] 
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=7

    Bernard: When I started of, I was thinking about capture with
    Media Capture and Streams Extensions, then encode/decode with
    WebCodecs (and also MSE v2 to some extent), Transport
    (WebTransport, WebRTC data channels in workers), and Frameworks
    (WHATWG streams, WASM)

    [22][Slide 8]

      [22] 
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=8

    Bernard: The pipeline model is based on WHATWG Streams, through
    TransformStreams piped together.
    … When you're sending frames, you have a several options, e.g.
    reliable/unreliable, etc.
    … To stream these pipelines together, you have to use all of
    these APIs together. Does it all make sense?
    … I don't know that many developers who understand all of these
    APIs.

    [23][Slide 9]

      [23] 
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=9

    Bernard: Some issues already created in the repo.

    [24]Media Pipeline architecture repo

      [24] https://github.com/w3c/media-pipeline-arch/

    Bernard: A lot of the issues are focused on transport.
    … There are a few things that are worth discussing here.
    … E.g. rendering and timing. Media Capture Transform is an
    interesting API. Does VideoTrackGenerator have a jitter buffer?
    Does it not?
    … That is not particularly well defined in the spec.

    [25][Slide 10]

      [25] 
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=10

    Bernard: We have two samples at the moment. One is a WebCodecs
    encode/decode in worker in the WebCodecs repo.
    … The second one adds WebTransport to that. This one took more
    work to optimize the transport. It adds
    serialization/deserialization.
    … We use frame/stream transport. That's not exactly RTP but
    it's close.
    … We're using SVC at baseline and partial reliability.
    … Overall, it's working surprisingly well.
    … I had to do a reorder buffer but still not a full jitter
    buffer.

    [26][Slide 11]

      [26] 
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=11

    Bernard: Here are some of the things that you can play with.
    … You can play with this stuff. At the end, it generates a
    Frame RTT graph. That does not really give you glass to glass
    measurements.
    … Performances are pretty reasonable now after some work.

    [27][Slide 12]

      [27] 
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=12

    Bernard: Slide shows an example with AV1 at full-HD.
    … What's interesting is that key frames can be transmitted
    within a single congestion window.
    … General question is what do we do with this?

    cpn: That's really great to get that practical feedback from
    building things.

    Bernard: Yes, we're seeing a lot of stuff. Similarly, there are
    a few things where I don't know enough of the internals to
    understand what needs to be done.
    … You have to be cautious of await calls with WHATWG Streams,
    since they are going to block. Debugging is also hard.

    youenn: Note you may use JS implementations or ReadableStream
    and WritableStream to ease debugging.

    Bernard: Good idea. You can get a dozen stages and you don't
    really know where things are in the different queues. It's not
    easy to figure out what happens. The code is fairly small
    though.

    cpn: Immediate next step?

    Bernard: Adding APIs in multiple groups adds question. It's
    worthwhile checking in on this periodically.
    … I don't want to act like I have a handle on this.

    cpn: OK, we'll talk more about how to improve that cross-group
    collaboration.

    cpn: Our next meeting will be on the new year. Happy Christmas
    and looking forward to seeing you next year!

Received on Wednesday, 14 December 2022 15:22:33 UTC