RE: [MSE] buffering/splicing/overlap model, ad insertion and video editing goals from Michael Thornburgh on 2013-02-25 (public-html-media@w3.org from February 2013)

From: Michael Thornburgh <mthornbu@adobe.com>
Date: Mon, 25 Feb 2013 12:01:02 -0800
To: "public-html-media@w3.org" <public-html-media@w3.org>
Message-ID: <02485FF93524F8408ECA9608E47D9F20098C4FE2FC@nambx05.corp.adobe.com>
hi Aaron.

i think this could work, along with bug 20901 to handle the "API-level discontinuity indicator".  in the case of such a discontinuity, you'd want the timeline splice and renormalization to happen at the appendWindowStart.

regarding the RAP behavior: the idea would be that a mechanism such as this could let you lay out data in the source buffer just like you could with doing overlaps, only appending in the natural playback order instead of having to do weird out-of-order layering.  as such, the non-RAP behavior should be the same as currently defined with the overlap model, with a similar note about "significantly increasing implementation complexity and delays at the splice point" as is in 2.1.3 End Overlap.

i think abort() should also reset the appendWindow properties.  the names seem fine to me.

i will open a bug.

-mike

From: Aaron Colwell [mailto:acolwell@google.com]
Sent: Monday, February 25, 2013 8:17 AM
To: Michael Thornburgh
Cc: public-html-media@w3.org
Subject: Re: [MSE] buffering/splicing/overlap model, ad insertion and video editing goals


Hi Michael,

I think I see what you are getting at here. I believe this functionality would essentially sit between steps 7 & 8 of the coded frame algorithm<https://dvcs.w3.org/hg/html-media/raw-file/default/media-source/media-source.html#sourcebuffer-coded-frame-processing> and would act as a "coded frame filter" or "append window".  How about this for an initial proposal.

Proposal:
partial interface SourceBuffer {
  attribute double appendWindowStart;
  attribute unrestricted double appendWindowEnd;
}

- appendWindowStart is initially set to 0;
- appendWindowEnd is initially set to positive Infinity.
- Setting appendWindowStart throws an exception if one tries to set it to a value >= appendWindowEnd.
- Setting appendWindowEnd throws and exception if one tries to set it to a value <= appendWindowStart.
- The attributes can only be modified when updating == false, just like timestampOffset.
- The coded frame processing algorithm drops coded frames w/ presentationTimestamp < appendWindowStart
- The coded frame processing algorithm drops coded frames w/ presentationTimestamp >= appendWindowEnd.
- If a coded frame is dropped before appendWindowStart, then a "needs RAP" flag is set so that the coded frame processing algorithm will continue to drop coded frames until it receives a RAP with a presentation timestamp >= appendWindowStart.

Questions:
 - Should abort() reset appendWindowStart & appendWindowEnd to 0 & positive Infinity respectively?
 - Any suggestions on better names for these attributes?


I believe this proposal would address most of your concerns. It will not support seamlessly splicing at a non-RAP boundary, but I'd like to defer that until v2 if possible since it would require having 2 decoder instances and/or require faster than real-time decoding. I'd like to nail down the simpler RAP based splices and get interop before diving into the non-RAP case.

If folks are ok with this, then I'd say file a bug and I'll start working on adding this to the spec.

Aaron
On Thu, Feb 21, 2013 at 11:48 AM, Michael Thornburgh <mthornbu@adobe.com<mailto:mthornbu@adobe.com>> wrote:

the current buffering/splicing/overlap model for media segments implies that the intended granularity for the "ad insertion" and "video editing" goals (section 1.1) is "whole segments".  the overlap & splicing behavior seems to be designed primarily for the adaptive streaming case, not necessarily for ad insertion and definitely not for the general "video editing" case (of which ad insertion is a subset).

consider programs A (the "main program") and B (the "ad"), with A being live.  the stream encoder/segmenter will typically be free-running, making random access points and segment boundaries in natural places independent of any external cue inputs.  an operator may at some point push the "ad goes here" button, which should only have to create a cue marker in the manifest file.  it may be impractical or infeasible to affect the operation of the encoder/segmenter to create a segment boundary at the ad-start or ad-end-and-main-program-resumes points.


0s               14s              31s        42s
                 +-- cue B                   +-- cue A
prog A           v                           v
|-----------|----:vvvvvv|. . . . .|vvvvvvvvvv:---|-----------|-----------|
 A1(1)       A2  :       A3(-)     A4(4)     :    A5(7)       A6(8)
            (2)  :B1(3)     B2(5)     B3(6)  :
                 |---------|---------|-------|
                 prog B
                 0s                       28s
1. append A1;
2. append A2;
3. append B1 at +14s in;
4. append A4;
5. append B2 at +14s in;
6. append B3 at +14s in;
7. append A5;
8. append A6...


in this example, main program segment A4 is overlapped by ad segments B2 and B3.  this can be accommodated with the current buffering/overlap model, but in a fairly unnatural way.  to achieve the desired rendering, the append order must be [A1, A2, B1, A4, B2, B3, A5, A6, ...] -- in other words, not in the natural playback order.  every application will need to implement a segment overlap scheduler to get this ordering right.  note also that there is a race with the playback position vs the appends, where if you're running close to the playback position, you might display a portion of the wrong program (for example, missing the beginning of an ad or temporarily switching back to the main program in the middle of the ad).

this works for the ad insertion case because the advertiser will typically want their entire ad played from beginning to end. for the general "video editing" case, there's no way to come in to program B at not-a-segment-boundary from program A not-a-segment-boundary, using the current model.

some months ago i did some experiments/proofs-of-concept with seamless ad insertion at non-segment/non-keyframe boundaries in Flash Player (built on top of the "appendBytes" APIs).  i had 4 simple primitives that gave general editing capabilities in the natural segment playback order, with no races (if data was late, playback would stall rather than playing the wrong thing):

  1) append segment data;
  2) discontinuity;
  3) stop appending from segment at time Te (until discontinuity);
  4) after discontinuity, start playback from new segment at time Tb (not necessarily at a keyframe, like a seek).

for the ad insertion example above, this looks like:


0s               14s              0s         11s
                 +-- cue B                   +-- cue A
prog A           v                           v
|-----------|----:XXXXXX|. . . . .|>>>>>>>>>>:---|-----------|-----------|
 A1(1)       A2  :       A3(-)     A4(6)     :    A5(7)       A6(8)
            (2)  :B1(3)     B2(4)     B3(5)  :
                 |---------|---------|-------|
                 prog B
                 0s                       28s

1. append A1;
2.
  2a. stop at 14s in (Te=14s);
  2b. append A2;
3.
  3a. discontinuity;
  3b. start next segment 0s in (Tb=0s relative)
  3c. append B1 at discontinuity;
4. append B2;
5. append B3;
6.
  6a. discontinuity;
  6b. start next segment 11s in (Tb=11s relative);
  6c. append A4 (skipping ahead to 11s in) at discontinuity;
7. append A5;
8. append A6...

note that this model could also support starting in on B at not-the-beginning and ending at not-the-end, if that was desired.

if it's the intention that ad insertion (and editing in general) should always be at segment boundaries, then the complications i described above go away and you can just append in the natural playback order.  however, i believe real-world use scenarios (especially ad insertion into live streams) will require seamless splicing at not-segment-boundaries, requiring implementation of the complicated scheduling and non-natural append order described above, as well as exposure to possible races.  i believe it would be advantageous to support this use case in a more natural way.

-michael thornburgh
Received on Monday, 25 February 2013 20:01:33 UTC