[media-source] Normatively allow user agents that don't do audio splice frame generation to do other things than just insert silence

wolenetz has just created a new issue for 
https://github.com/w3c/media-source:

== Normatively allow user agents that don't do audio splice frame 
generation to do other things than just insert silence ==
The V1 MSE spec's audio splice frame algorithm allows for only 2 
cases, where others may be preferred by user agents. Existing cases:
1) (normatively spec'ed) User agent **does not** support cross-fading.
 Insert silence for the portion of the overlapped frame that remains 
after being overlapped. (non-normatively:) Optionally fade to/from 
that silence when rendering that inserted silence.
2) (normatively spec'ed) User agent **does** support cross-fading. 
Retain the existing overlapped frame and do a 5 millisecon d 
cross-fade from it to the overlapping frame content at the splice 
point where the overlapping frame begins.

This bug tracks adding an alternative approach normatively:
3) User agent **does not** support cross-fading. Neither does user 
agent want to insert silence. Rather, just trim the overlapped frame 
right at the splice point (the adjusted-to-overlapped-frame-samplerate
 overlapping frame's PTS). Something like the following text:

* If the user agent does not support crossfading or silence insertion 
on overlap, then run the following steps:
 * Update the overlapped frame in the track buffer with a new frame 
consisting of the overlapped coded frame with coded frame duration set
 to differece between presentation timestamp and the overlapped 
presentation timestamp.
 * When rendering the overlapped frame, discard any decoded samples 
that exceed that duration.
 * Return to caller without providing a splice frame.

Note: Chrome is taking this new approach in 
https://codereview.chromium.org/2343543002. Due to unfortunate amount 
of badly muxed media timestamp and duration information in media used 
with MSE in the wild, Chrome encounters a significant number of audio 
splice rendering algorithm failures because the actual accumulation of
 decoded samples sometimes doesn't match the coded frame's timestamps 
and durations. This alternative approach allows simplification with 
the loss of just the 5 millisecond cross-fade. While the existing 
"insert-silence" alternative would also simplify the handling of such 
badly muxed media, it comes with reduced UX (complete loss of 
overlapped frame decoded audio samples).

Note to editors: This seems like something that could be done after V1
 is in PR or even REC. It is more a quality-of-implementation item, 
and doesn't change the existing alternatives for how to handle 
splices; rather it adds a further alternative.

Triaging as VNext accordingly.

Please view or discuss this issue at 
https://github.com/w3c/media-source/issues/165 using your GitHub 
account

Received on Thursday, 22 September 2016 20:56:14 UTC