Re: [MSE] buffering/splicing/overlap model, ad insertion and video editing goals

Le 25/02/2013 17:16, Aaron Colwell a écrit :
>
> Hi Michael,
>
> I think I see what you are getting at here. I believe this 
> functionality would essentially sit between steps 7 & 8 of the coded 
> frame algorithm 
> <https://dvcs.w3.org/hg/html-media/raw-file/default/media-source/media-source.html#sourcebuffer-coded-frame-processing> and 
> would act as a "coded frame filter" or "append window".  How about 
> this for an initial proposal.
>
> *Proposal:*
> partial interface SourceBuffer {
>   attribute double appendWindowStart;
>   attribute unrestricted double appendWindowEnd;
> }
>
> - appendWindowStart is initially set to 0;
> - appendWindowEnd is initially set to positive Infinity.
> - Setting appendWindowStart throws an exception if one tries to set it 
> to a value >= appendWindowEnd.
> - Setting appendWindowEnd throws and exception if one tries to set it 
> to a value <= appendWindowStart.
> - The attributes can only be modified when updating == false, just 
> like timestampOffset.
> - The coded frame processing algorithm drops coded frames w/ 
> presentationTimestamp < appendWindowStart
> - The coded frame processing algorithm drops coded frames w/ 
> presentationTimestamp >= appendWindowEnd.
Would the timestamp comparison be applied before or after the 
timestampOffset shift?

Cyril
> - If a coded frame is dropped before appendWindowStart, then a "needs 
> RAP" flag is set so that the coded frame processing algorithm will 
> continue to drop coded frames until it receives a RAP with a 
> presentation timestamp >= appendWindowStart.
>
> *Questions:*
>  - Should abort() reset appendWindowStart & appendWindowEnd to 0 & 
> positive Infinity respectively?
>  - Any suggestions on better names for these attributes?
>
>
> I believe this proposal would address most of your concerns. It will 
> not support seamlessly splicing at a non-RAP boundary, but I'd like to 
> defer that until v2 if possible since it would require having 2 
> decoder instances and/or require faster than real-time decoding. I'd 
> like to nail down the simpler RAP based splices and get interop before 
> diving into the non-RAP case.
>
> If folks are ok with this, then I'd say file a bug and I'll start 
> working on adding this to the spec.
>
> Aaron
>
> On Thu, Feb 21, 2013 at 11:48 AM, Michael Thornburgh 
> <mthornbu@adobe.com <mailto:mthornbu@adobe.com>> wrote:
>
>
>     the current buffering/splicing/overlap model for media segments
>     implies that the intended granularity for the "ad insertion" and
>     "video editing" goals (section 1.1) is "whole segments".  the
>     overlap & splicing behavior seems to be designed primarily for the
>     adaptive streaming case, not necessarily for ad insertion and
>     definitely not for the general "video editing" case (of which ad
>     insertion is a subset).
>
>     consider programs A (the "main program") and B (the "ad"), with A
>     being live.  the stream encoder/segmenter will typically be
>     free-running, making random access points and segment boundaries
>     in natural places independent of any external cue inputs.  an
>     operator may at some point push the "ad goes here" button, which
>     should only have to create a cue marker in the manifest file.  it
>     may be impractical or infeasible to affect the operation of the
>     encoder/segmenter to create a segment boundary at the ad-start or
>     ad-end-and-main-program-resumes points.
>
>
>     0s               14s              31s        42s
>                      +-- cue B                   +-- cue A
>     prog A           v                           v
>     |-----------|----:vvvvvv|. . . .
>     .|vvvvvvvvvv:---|-----------|-----------|
>      A1(1)       A2  :       A3(-)     A4(4)     :    A5(7)     A6(8)
>                 (2)  :B1(3)     B2(5)     B3(6)  :
>                      |---------|---------|-------|
>                      prog B
>                      0s                       28s
>     1. append A1;
>     2. append A2;
>     3. append B1 at +14s in;
>     4. append A4;
>     5. append B2 at +14s in;
>     6. append B3 at +14s in;
>     7. append A5;
>     8. append A6...
>
>
>     in this example, main program segment A4 is overlapped by ad
>     segments B2 and B3.  this can be accommodated with the current
>     buffering/overlap model, but in a fairly unnatural way.  to
>     achieve the desired rendering, the append order must be [A1, A2,
>     B1, A4, B2, B3, A5, A6, ...] -- in other words, not in the natural
>     playback order.  every application will need to implement a
>     segment overlap scheduler to get this ordering right.  note also
>     that there is a race with the playback position vs the appends,
>     where if you're running close to the playback position, you might
>     display a portion of the wrong program (for example, missing the
>     beginning of an ad or temporarily switching back to the main
>     program in the middle of the ad).
>
>     this works for the ad insertion case because the advertiser will
>     typically want their entire ad played from beginning to end. for
>     the general "video editing" case, there's no way to come in to
>     program B at not-a-segment-boundary from program A
>     not-a-segment-boundary, using the current model.
>
>     some months ago i did some experiments/proofs-of-concept with
>     seamless ad insertion at non-segment/non-keyframe boundaries in
>     Flash Player (built on top of the "appendBytes" APIs).  i had 4
>     simple primitives that gave general editing capabilities in the
>     natural segment playback order, with no races (if data was late,
>     playback would stall rather than playing the wrong thing):
>
>       1) append segment data;
>       2) discontinuity;
>       3) stop appending from segment at time Te (until discontinuity);
>       4) after discontinuity, start playback from new segment at time
>     Tb (not necessarily at a keyframe, like a seek).
>
>     for the ad insertion example above, this looks like:
>
>
>     0s               14s              0s         11s
>                      +-- cue B                   +-- cue A
>     prog A           v                           v
>     |-----------|----:XXXXXX|. . . .
>     .|>>>>>>>>>>:---|-----------|-----------|
>      A1(1)       A2  :       A3(-)     A4(6)     :    A5(7)     A6(8)
>                 (2)  :B1(3)     B2(4)     B3(5)  :
>                      |---------|---------|-------|
>                      prog B
>                      0s                       28s
>
>     1. append A1;
>     2.
>       2a. stop at 14s in (Te=14s);
>       2b. append A2;
>     3.
>       3a. discontinuity;
>       3b. start next segment 0s in (Tb=0s relative)
>       3c. append B1 at discontinuity;
>     4. append B2;
>     5. append B3;
>     6.
>       6a. discontinuity;
>       6b. start next segment 11s in (Tb=11s relative);
>       6c. append A4 (skipping ahead to 11s in) at discontinuity;
>     7. append A5;
>     8. append A6...
>
>     note that this model could also support starting in on B at
>     not-the-beginning and ending at not-the-end, if that was desired.
>
>     if it's the intention that ad insertion (and editing in general)
>     should always be at segment boundaries, then the complications i
>     described above go away and you can just append in the natural
>     playback order.  however, i believe real-world use scenarios
>     (especially ad insertion into live streams) will require seamless
>     splicing at not-segment-boundaries, requiring implementation of
>     the complicated scheduling and non-natural append order described
>     above, as well as exposure to possible races.  i believe it would
>     be advantageous to support this use case in a more natural way.
>
>     -michael thornburgh
>
>
>
>


-- 
Cyril Concolato
Maître de Conférences/Associate Professor
Groupe Multimedia/Multimedia Group
Telecom ParisTech
46 rue Barrault
75 013 Paris, France
http://concolato.wp.mines-telecom.fr/

Received on Tuesday, 26 February 2013 08:54:19 UTC