[MSE] buffering/splicing/overlap model, ad insertion and video editing goals

the current buffering/splicing/overlap model for media segments implies that the intended granularity for the "ad insertion" and "video editing" goals (section 1.1) is "whole segments".  the overlap & splicing behavior seems to be designed primarily for the adaptive streaming case, not necessarily for ad insertion and definitely not for the general "video editing" case (of which ad insertion is a subset).

consider programs A (the "main program") and B (the "ad"), with A being live.  the stream encoder/segmenter will typically be free-running, making random access points and segment boundaries in natural places independent of any external cue inputs.  an operator may at some point push the "ad goes here" button, which should only have to create a cue marker in the manifest file.  it may be impractical or infeasible to affect the operation of the encoder/segmenter to create a segment boundary at the ad-start or ad-end-and-main-program-resumes points.


0s               14s              31s        42s
                 +-- cue B                   +-- cue A
prog A           v                           v
|-----------|----:vvvvvv|. . . . .|vvvvvvvvvv:---|-----------|-----------|
 A1(1)       A2  :       A3(-)     A4(4)     :    A5(7)       A6(8)
            (2)  :B1(3)     B2(5)     B3(6)  :
                 |---------|---------|-------|
                 prog B
                 0s                       28s
1. append A1;
2. append A2;
3. append B1 at +14s in;
4. append A4;
5. append B2 at +14s in;
6. append B3 at +14s in;
7. append A5;
8. append A6...


in this example, main program segment A4 is overlapped by ad segments B2 and B3.  this can be accommodated with the current buffering/overlap model, but in a fairly unnatural way.  to achieve the desired rendering, the append order must be [A1, A2, B1, A4, B2, B3, A5, A6, ...] -- in other words, not in the natural playback order.  every application will need to implement a segment overlap scheduler to get this ordering right.  note also that there is a race with the playback position vs the appends, where if you're running close to the playback position, you might display a portion of the wrong program (for example, missing the beginning of an ad or temporarily switching back to the main program in the middle of the ad).

this works for the ad insertion case because the advertiser will typically want their entire ad played from beginning to end. for the general "video editing" case, there's no way to come in to program B at not-a-segment-boundary from program A not-a-segment-boundary, using the current model.

some months ago i did some experiments/proofs-of-concept with seamless ad insertion at non-segment/non-keyframe boundaries in Flash Player (built on top of the "appendBytes" APIs).  i had 4 simple primitives that gave general editing capabilities in the natural segment playback order, with no races (if data was late, playback would stall rather than playing the wrong thing):

  1) append segment data;
  2) discontinuity;
  3) stop appending from segment at time Te (until discontinuity);
  4) after discontinuity, start playback from new segment at time Tb (not necessarily at a keyframe, like a seek).

for the ad insertion example above, this looks like:


0s               14s              0s         11s
                 +-- cue B                   +-- cue A
prog A           v                           v
|-----------|----:XXXXXX|. . . . .|>>>>>>>>>>:---|-----------|-----------|
 A1(1)       A2  :       A3(-)     A4(6)     :    A5(7)       A6(8)
            (2)  :B1(3)     B2(4)     B3(5)  :
                 |---------|---------|-------|
                 prog B
                 0s                       28s

1. append A1;
2.
  2a. stop at 14s in (Te=14s);
  2b. append A2;
3.
  3a. discontinuity;
  3b. start next segment 0s in (Tb=0s relative)
  3c. append B1 at discontinuity;
4. append B2;
5. append B3;
6.
  6a. discontinuity;
  6b. start next segment 11s in (Tb=11s relative);
  6c. append A4 (skipping ahead to 11s in) at discontinuity;
7. append A5;
8. append A6...

note that this model could also support starting in on B at not-the-beginning and ending at not-the-end, if that was desired.

if it's the intention that ad insertion (and editing in general) should always be at segment boundaries, then the complications i described above go away and you can just append in the natural playback order.  however, i believe real-world use scenarios (especially ad insertion into live streams) will require seamless splicing at not-segment-boundaries, requiring implementation of the complicated scheduling and non-natural append order described above, as well as exposure to possible races.  i believe it would be advantageous to support this use case in a more natural way.

-michael thornburgh

Received on Thursday, 21 February 2013 19:48:41 UTC