Re: [MSE] buffering/splicing/overlap model, ad insertion and video editing goals from Aaron Colwell on 2013-02-25 (public-html-media@w3.org from February 2013)

From: Aaron Colwell <acolwell@google.com>
Date: Mon, 25 Feb 2013 08:16:46 -0800
To: Michael Thornburgh <mthornbu@adobe.com>
Cc: "public-html-media@w3.org" <public-html-media@w3.org>
Message-ID: <CAA0c1bAS5=0tK9kLK=2O-sgux4+_GoMay_1JXagMtjyTY2w9OA@mail.gmail.com>
Hi Michael,

I think I see what you are getting at here. I believe this functionality
would essentially sit between steps 7 & 8 of the coded frame
algorithm<https://dvcs.w3.org/hg/html-media/raw-file/default/media-source/media-source.html#sourcebuffer-coded-frame-processing>
and
would act as a "coded frame filter" or "append window".  How about this for
an initial proposal.

*Proposal:*
partial interface SourceBuffer {
  attribute double appendWindowStart;
  attribute unrestricted double appendWindowEnd;
}

- appendWindowStart is initially set to 0;
- appendWindowEnd is initially set to positive Infinity.
- Setting appendWindowStart throws an exception if one tries to set it to a
value >= appendWindowEnd.
- Setting appendWindowEnd throws and exception if one tries to set it to a
value <= appendWindowStart.
- The attributes can only be modified when updating == false, just like
timestampOffset.
- The coded frame processing algorithm drops coded frames w/
presentationTimestamp < appendWindowStart
- The coded frame processing algorithm drops coded frames w/
presentationTimestamp >= appendWindowEnd.
- If a coded frame is dropped before appendWindowStart, then a "needs RAP"
flag is set so that the coded frame processing algorithm will continue to
drop coded frames until it receives a RAP with a presentation timestamp >=
appendWindowStart.

*Questions:*
 - Should abort() reset appendWindowStart & appendWindowEnd to 0 & positive
Infinity respectively?
 - Any suggestions on better names for these attributes?


I believe this proposal would address most of your concerns. It will not
support seamlessly splicing at a non-RAP boundary, but I'd like to defer
that until v2 if possible since it would require having 2 decoder instances
and/or require faster than real-time decoding. I'd like to nail down the
simpler RAP based splices and get interop before diving into the non-RAP
case.

If folks are ok with this, then I'd say file a bug and I'll start working
on adding this to the spec.

Aaron

On Thu, Feb 21, 2013 at 11:48 AM, Michael Thornburgh <mthornbu@adobe.com>wrote:

>
> the current buffering/splicing/overlap model for media segments implies
> that the intended granularity for the "ad insertion" and "video editing"
> goals (section 1.1) is "whole segments".  the overlap & splicing behavior
> seems to be designed primarily for the adaptive streaming case, not
> necessarily for ad insertion and definitely not for the general "video
> editing" case (of which ad insertion is a subset).
>
> consider programs A (the "main program") and B (the "ad"), with A being
> live.  the stream encoder/segmenter will typically be free-running, making
> random access points and segment boundaries in natural places independent
> of any external cue inputs.  an operator may at some point push the "ad
> goes here" button, which should only have to create a cue marker in the
> manifest file.  it may be impractical or infeasible to affect the operation
> of the encoder/segmenter to create a segment boundary at the ad-start or
> ad-end-and-main-program-resumes points.
>
>
> 0s               14s              31s        42s
>                  +-- cue B                   +-- cue A
> prog A           v                           v
> |-----------|----:vvvvvv|. . . . .|vvvvvvvvvv:---|-----------|-----------|
>  A1(1)       A2  :       A3(-)     A4(4)     :    A5(7)       A6(8)
>             (2)  :B1(3)     B2(5)     B3(6)  :
>                  |---------|---------|-------|
>                  prog B
>                  0s                       28s
> 1. append A1;
> 2. append A2;
> 3. append B1 at +14s in;
> 4. append A4;
> 5. append B2 at +14s in;
> 6. append B3 at +14s in;
> 7. append A5;
> 8. append A6...
>
>
> in this example, main program segment A4 is overlapped by ad segments B2
> and B3.  this can be accommodated with the current buffering/overlap model,
> but in a fairly unnatural way.  to achieve the desired rendering, the
> append order must be [A1, A2, B1, A4, B2, B3, A5, A6, ...] -- in other
> words, not in the natural playback order.  every application will need to
> implement a segment overlap scheduler to get this ordering right.  note
> also that there is a race with the playback position vs the appends, where
> if you're running close to the playback position, you might display a
> portion of the wrong program (for example, missing the beginning of an ad
> or temporarily switching back to the main program in the middle of the ad).
>
> this works for the ad insertion case because the advertiser will typically
> want their entire ad played from beginning to end. for the general "video
> editing" case, there's no way to come in to program B at
> not-a-segment-boundary from program A not-a-segment-boundary, using the
> current model.
>
> some months ago i did some experiments/proofs-of-concept with seamless ad
> insertion at non-segment/non-keyframe boundaries in Flash Player (built on
> top of the "appendBytes" APIs).  i had 4 simple primitives that gave
> general editing capabilities in the natural segment playback order, with no
> races (if data was late, playback would stall rather than playing the wrong
> thing):
>
>   1) append segment data;
>   2) discontinuity;
>   3) stop appending from segment at time Te (until discontinuity);
>   4) after discontinuity, start playback from new segment at time Tb (not
> necessarily at a keyframe, like a seek).
>
> for the ad insertion example above, this looks like:
>
>
> 0s               14s              0s         11s
>                  +-- cue B                   +-- cue A
> prog A           v                           v
> |-----------|----:XXXXXX|. . . . .|>>>>>>>>>>:---|-----------|-----------|
>  A1(1)       A2  :       A3(-)     A4(6)     :    A5(7)       A6(8)
>             (2)  :B1(3)     B2(4)     B3(5)  :
>                  |---------|---------|-------|
>                  prog B
>                  0s                       28s
>
> 1. append A1;
> 2.
>   2a. stop at 14s in (Te=14s);
>   2b. append A2;
> 3.
>   3a. discontinuity;
>   3b. start next segment 0s in (Tb=0s relative)
>   3c. append B1 at discontinuity;
> 4. append B2;
> 5. append B3;
> 6.
>   6a. discontinuity;
>   6b. start next segment 11s in (Tb=11s relative);
>   6c. append A4 (skipping ahead to 11s in) at discontinuity;
> 7. append A5;
> 8. append A6...
>
> note that this model could also support starting in on B at
> not-the-beginning and ending at not-the-end, if that was desired.
>
> if it's the intention that ad insertion (and editing in general) should
> always be at segment boundaries, then the complications i described above
> go away and you can just append in the natural playback order.  however, i
> believe real-world use scenarios (especially ad insertion into live
> streams) will require seamless splicing at not-segment-boundaries,
> requiring implementation of the complicated scheduling and non-natural
> append order described above, as well as exposure to possible races.  i
> believe it would be advantageous to support this use case in a more natural
> way.
>
> -michael thornburgh
>
>
>
>
Received on Monday, 25 February 2013 16:17:17 UTC