- From: François Daoust <fd@w3.org>
- Date: Wed, 14 Dec 2022 15:22:28 +0000
- To: "public-media-wg@w3.org" <public-media-wg@w3.org>
Hi all,
The minutes (and slides) of this week's Media WG call are available at:
https://www.w3.org/2022/12/13-mediawg-minutes.html
... and copied as raw text below.
Thanks,
Francois
-----
Media WG Teleconference - 2022-12-13
13 December 2022
[2]Agenda. [3]IRC log.
[2]
https://github.com/w3c/media-wg/blob/main/meetings/2022-12-13-Media_Working_Group_Teleconference-agenda.md#agenda
[3] https://www.w3.org/2022/12/13-mediawg-irc
Attendees
Present
Alastor Wu, Bernard Aboba, Chris Needham, Dale Curtis,
Eric Carlson, Francois Daoust, Frank Liberato, Harald
Alverstrand, Jer Noble, Matt Wolenetz, Peter Thatcher,
Sushanth Rajasankar, Youenn Fablet
Regrets
-
Chair
-
Scribe
cpn, tidoust
Contents
1. [4]ITU-T SG16 Liaison statement on WebCodecs
2. [5]WebKit update on Audio focus/audio session API
3. [6]Consistent SVC metadata between WebCodecs and Encoded
Transform API
4. [7]Media Pipeline architecture - Media WG input and WebRTC
collaboration planning
Meeting minutes
ITU-T SG16 Liaison statement on WebCodecs
cpn: We received an incoming liaison statement from ITU-T SG16.
[8]https://github.com/w3c/media-wg/blob/main/liaisons/
2022-10-28-itu-t-sg16.md <- Draft reply
[8]
https://github.com/w3c/media-wg/blob/main/liaisons/2022-10-28-itu-t-sg16.md
cpn: Around WebCodecs, and also around new VVC codec.
… I drafted a reply, which describes WebCodecs, the use cases,
a few indications about our own plans such as current work on
VideoFrame metadata registry.
… I shared this. Got a thumbs up from Bernard, Jer, Paul.
… I want to make sure that everything we write here is
representative.
… I was hoping to get this out before the Christmas break.
… If you haven't had a chance to look at it yet, now would be a
good time.
youenn: I like the fact that you state that the group would be
open to add a registration provided there was support from
implementors.
… I assume that means user agent implementors?
cpn: That's a question for the group perhaps. H.263 comes to
mind for instance.
Dale_Curtis: I don't think that we want to be gatekeepers of
what the registry contains, even though there isn't support in
web browsers per se.
… We'd still want some technical constraints to be met.
cpn: Right. That would apply to any future registration as
well.
WebKit update on Audio focus/audio session API
Slideset: [9]https://lists.w3.org/Archives/Public/www-archive/
2022Dec/att-0000/AudioSession_API.pdf
[9]
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0000/AudioSession_API.pdf
[10][Slide 2]
[10]
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0000/AudioSession_API.pdf#page=2
Youenn: We received reports that audio handling on iOS isn't
easy, e.g., VC applications
… The intent of the application may not match our heuristics
for setting up the audio pipeline
… So a new API may be appropriate
… You might remember the Audio Focus API, initially in Media
Session, then split out from that
… There's an explainer, linked from the slides
… The overall goal is to get feedback, is the scope right, next
steps?
… Compared to the original Audio Focus API, we wanted to reduce
scope, for the iOS platform
… We focused on the audio session category, and interruptions
… The API should support future features such as requesting or
abandoning audio focus
… Handling audio providers as a group
… We wrote an explainer, and a prototype in WebKit
[11][Slide 3]
[11]
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0000/AudioSession_API.pdf#page=3
Youenn: Some examples: setting the audio session category, you
can open the demo in iOS
… playAudio and capture functions, for microphone input
… If you call playAudio initially, then capture, it's
disruptive in iOS. The reason is that when you play using Web
Audio, it's ambient
… Two different audio levels when going from ambient to play &
record. Something we want to avoid
… The setCategory function allows you to set the category to
play & record, don't use ambient
[12][Slide 4]
[12]
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0000/AudioSession_API.pdf#page=4
Youenn: On interruption, when you're in a video call, you might
receive a phone call, which is higher priority, and the website
is interrupted, capture stopped, audio or video elements may be
stopped
… But the website may not know that
… It's also not clear to the website whether to restart audio
after the phone call
… Providing the concept of an audio session, which can go
between active and interrupted, allows the website to change
what is visible to the user
… On an interruption, it could show a UI, or UI to allow the
user to restart capture
[13][Slide 5]
[13]
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0000/AudioSession_API.pdf#page=5
Youenn: We tried to keep the API small. There's an audio
session state and audio session type. Then we added an
AudioSession interface, which we though was clearer
… Use that to say it's ambient (mix with others), or play &
record, so the UA can set the audio pipeline accordingly
… There are event handlers, no constructor. For simple use
cases, a getter on navigator to get the audio session
… A default global audio session. Use this object to query or
tailor it
[14][Slide 6]
[14]
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0000/AudioSession_API.pdf#page=6
Youenn: My main interest is not to go into specific issues.
More issues are welcome
… Question: is this of interest, is it going in the right
direction? Any thoughts on potential next steps?
Dale: From a Chrome point of view, Mounir and Becca worked on
it. At a glance, seems reasonable. There might be worry about
duplication between Media Session and Audio Session, but no
specific thoughts on that
Youenn: The API shape is different, there might be only one
Media Session in a page, but only one Audio Session
… The call to split the two things in the past is OK
… We decided to delay the grabbing and releasing of audio
focus. There might be other things to consider, e.g., auto play
… A question I have, is it's not yet submitted in the WG. Is it
already in scope?
cpn: Looking at the charter, Audio Focus API is in the list of
potential normative deliverables
… We just need to run a call for consensus to adopt the spec to
the Media WG
Sushanth: How to handle audio from multiple tabs?
Youenn: This would help with that
Sushanth: If the audio type requested by one browser is
playback, and from another is ambient, only one can exist at a
time
Youenn: You'd mimic what two native applications would do. One
session with playback would probably not be interrupted by
another that requests ambient
cpn: At what point would we be ready to run a call for
consensus on this?
youenn: If there's already consensus in this call, we'd be
interested to run it as soon as possible.
… No particular hurry, but the sooner the better.
… If there's no consensus, we'd like to know what to work on.
cpn: Just worried about support from other browser vendors.
youenn: We talked a bit with Mozilla. I can check with them and
get back to you.
alwu: From Mozilla Firefox perspective, that's an API we'd be
interested in supporting as well.
Dale: And no reason to hold off calling for consensus while we
figure things out internally.
jernoble: In the meantime, feedback on existing issues is
welcome.
cpn: So proposed resolution is to run a CfC.
Consistent SVC metadata between WebCodecs and Encoded Transform API
Slideset: [15]https://lists.w3.org/Archives/Public/www-archive/
2022Dec/att-0002/MEDIAWG-12-13-2022.pdf
[15]
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf
[16][Slide 2]
[16]
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=2
[17][Slide 3]
[17]
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=3
[18][Slide 4]
[18]
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=4
[19][Slide 5]
[19]
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=5
Bernard: [going through slides]. Sequence of unsigned long
dependencies. There's also some missing information.
… We're essentially re-inventing WebCodecs in another spec,
perhaps not the right way to go.
… Two different SVC metadata dictionaries could be avoided.
… Temporal may be shipping in Safari, but spatial is not
shipping anywhere.
Dale: I'm in favor of unifying what we can.
Bernard: Proposal is for a few of us to get together and
prepare a PR to harmonize things
… This would at least avoid future issues.
… We made some progress in the last couple of days, and Youenn
prepared a bunch of PRs that solved a number of type
mismatches.
cpn: Is this something for the WebCodecs spec itself or the
metadata registry?
Bernard: This is for encoded metadata for which we don't have a
registry.
Media Pipeline architecture - Media WG input and WebRTC collaboration
planning
cpn: Back at TPAC, we identified several places where we may
benefit from coordination between groups.
… This is picking up on where we're at with this.
[20][Slide 6]
[20]
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=6
Bernard: We created a Media Pipeline architecture repo
following discussions.
… Issues and pointers to sample code covering integration of
next generation web media apis.
… Also to go beyond just the specs we mentioned already, e.g.
WebTransport which could be used to transport media.
… From time to time, it's hard to undertand whether there are
performance issues in the specs, implementations or in the code
sample.
[21][Slide 7]
[21]
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=7
Bernard: When I started of, I was thinking about capture with
Media Capture and Streams Extensions, then encode/decode with
WebCodecs (and also MSE v2 to some extent), Transport
(WebTransport, WebRTC data channels in workers), and Frameworks
(WHATWG streams, WASM)
[22][Slide 8]
[22]
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=8
Bernard: The pipeline model is based on WHATWG Streams, through
TransformStreams piped together.
… When you're sending frames, you have a several options, e.g.
reliable/unreliable, etc.
… To stream these pipelines together, you have to use all of
these APIs together. Does it all make sense?
… I don't know that many developers who understand all of these
APIs.
[23][Slide 9]
[23]
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=9
Bernard: Some issues already created in the repo.
[24]Media Pipeline architecture repo
[24] https://github.com/w3c/media-pipeline-arch/
Bernard: A lot of the issues are focused on transport.
… There are a few things that are worth discussing here.
… E.g. rendering and timing. Media Capture Transform is an
interesting API. Does VideoTrackGenerator have a jitter buffer?
Does it not?
… That is not particularly well defined in the spec.
[25][Slide 10]
[25]
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=10
Bernard: We have two samples at the moment. One is a WebCodecs
encode/decode in worker in the WebCodecs repo.
… The second one adds WebTransport to that. This one took more
work to optimize the transport. It adds
serialization/deserialization.
… We use frame/stream transport. That's not exactly RTP but
it's close.
… We're using SVC at baseline and partial reliability.
… Overall, it's working surprisingly well.
… I had to do a reorder buffer but still not a full jitter
buffer.
[26][Slide 11]
[26]
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=11
Bernard: Here are some of the things that you can play with.
… You can play with this stuff. At the end, it generates a
Frame RTT graph. That does not really give you glass to glass
measurements.
… Performances are pretty reasonable now after some work.
[27][Slide 12]
[27]
https://lists.w3.org/Archives/Public/www-archive/2022Dec/att-0002/MEDIAWG-12-13-2022.pdf#page=12
Bernard: Slide shows an example with AV1 at full-HD.
… What's interesting is that key frames can be transmitted
within a single congestion window.
… General question is what do we do with this?
cpn: That's really great to get that practical feedback from
building things.
Bernard: Yes, we're seeing a lot of stuff. Similarly, there are
a few things where I don't know enough of the internals to
understand what needs to be done.
… You have to be cautious of await calls with WHATWG Streams,
since they are going to block. Debugging is also hard.
youenn: Note you may use JS implementations or ReadableStream
and WritableStream to ease debugging.
Bernard: Good idea. You can get a dozen stages and you don't
really know where things are in the different queues. It's not
easy to figure out what happens. The code is fairly small
though.
cpn: Immediate next step?
Bernard: Adding APIs in multiple groups adds question. It's
worthwhile checking in on this periodically.
… I don't want to act like I have a handle on this.
cpn: OK, we'll talk more about how to improve that cross-group
collaboration.
cpn: Our next meeting will be on the new year. Happy Christmas
and looking forward to seeing you next year!
Received on Wednesday, 14 December 2022 15:22:33 UTC