Re: [MSE] buffering/splicing/overlap model, ad insertion and video editing goals from Cyril Concolato on 2013-02-26 (public-html-media@w3.org from February 2013)

From: Cyril Concolato <cyril.concolato@telecom-paristech.fr>
Date: Tue, 26 Feb 2013 09:57:45 +0100
To: public-html-media@w3.org
Message-ID: <512C7909.9050908@telecom-paristech.fr>
Le 25/02/2013 21:01, Michael Thornburgh a écrit :
>
> hi Aaron.
>
> i think this could work, along with bug 20901 to handle the "API-level 
> discontinuity indicator".  in the case of such a discontinuity, you'd 
> want the timeline splice and renormalization to happen at the 
> appendWindowStart.
>
Similarly, you shouldn't have to know the duration of your ad (to avoid 
precision problems), the switch back to the live stream should be 
handled by the media engine.

Cyril
>
> regarding the RAP behavior: the idea would be that a mechanism such as 
> this could let you lay out data in the source buffer just like you 
> could with doing overlaps, only appending in the natural playback 
> order instead of having to do weird out-of-order layering.  as such, 
> the non-RAP behavior should be the same as currently defined with the 
> overlap model, with a similar note about "significantly increasing 
> implementation complexity and delays at the splice point" as is in 
> 2.1.3 End Overlap.
>
> i think abort() should also reset the appendWindow properties.  the 
> names seem fine to me.
>
> i will open a bug.
>
> -mike
>
> *From:*Aaron Colwell [mailto:acolwell@google.com]
> *Sent:* Monday, February 25, 2013 8:17 AM
> *To:* Michael Thornburgh
> *Cc:* public-html-media@w3.org
> *Subject:* Re: [MSE] buffering/splicing/overlap model, ad insertion 
> and video editing goals
>
> Hi Michael,
>
> I think I see what you are getting at here. I believe this 
> functionality would essentially sit between steps 7 & 8 of the coded 
> frame algorithm 
> <https://dvcs.w3.org/hg/html-media/raw-file/default/media-source/media-source.html#sourcebuffer-coded-frame-processing> and 
> would act as a "coded frame filter" or "append window".  How about 
> this for an initial proposal.
>
> *Proposal:*
>
> partial interface SourceBuffer {
>
>   attribute double appendWindowStart;
>
>   attribute unrestricted double appendWindowEnd;
>
> }
>
> - appendWindowStart is initially set to 0;
>
> - appendWindowEnd is initially set to positive Infinity.
>
> - Setting appendWindowStart throws an exception if one tries to set it 
> to a value >= appendWindowEnd.
>
> - Setting appendWindowEnd throws and exception if one tries to set it 
> to a value <= appendWindowStart.
>
> - The attributes can only be modified when updating == false, just 
> like timestampOffset.
>
> - The coded frame processing algorithm drops coded frames w/ 
> presentationTimestamp < appendWindowStart
>
> - The coded frame processing algorithm drops coded frames w/ 
> presentationTimestamp >= appendWindowEnd.
>
> - If a coded frame is dropped before appendWindowStart, then a "needs 
> RAP" flag is set so that the coded frame processing algorithm will 
> continue to drop coded frames until it receives a RAP with a 
> presentation timestamp >= appendWindowStart.
>
> *Questions:*
>
>  - Should abort() reset appendWindowStart & appendWindowEnd to 0 & 
> positive Infinity respectively?
>
>  - Any suggestions on better names for these attributes?
>
> I believe this proposal would address most of your concerns. It will 
> not support seamlessly splicing at a non-RAP boundary, but I'd like to 
> defer that until v2 if possible since it would require having 2 
> decoder instances and/or require faster than real-time decoding. I'd 
> like to nail down the simpler RAP based splices and get interop before 
> diving into the non-RAP case.
>
> If folks are ok with this, then I'd say file a bug and I'll start 
> working on adding this to the spec.
>
> Aaron
>
> On Thu, Feb 21, 2013 at 11:48 AM, Michael Thornburgh 
> <mthornbu@adobe.com <mailto:mthornbu@adobe.com>> wrote:
>
>
> the current buffering/splicing/overlap model for media segments 
> implies that the intended granularity for the "ad insertion" and 
> "video editing" goals (section 1.1) is "whole segments".  the overlap 
> & splicing behavior seems to be designed primarily for the adaptive 
> streaming case, not necessarily for ad insertion and definitely not 
> for the general "video editing" case (of which ad insertion is a subset).
>
> consider programs A (the "main program") and B (the "ad"), with A 
> being live.  the stream encoder/segmenter will typically be 
> free-running, making random access points and segment boundaries in 
> natural places independent of any external cue inputs.  an operator 
> may at some point push the "ad goes here" button, which should only 
> have to create a cue marker in the manifest file.  it may be 
> impractical or infeasible to affect the operation of the 
> encoder/segmenter to create a segment boundary at the ad-start or 
> ad-end-and-main-program-resumes points.
>
>
> 0s               14s              31s        42s
>                  +-- cue B                   +-- cue A
> prog A           v                           v
> |-----------|----:vvvvvv|. . . . .|vvvvvvvvvv:---|-----------|-----------|
>  A1(1)       A2  :       A3(-)     A4(4)     :  A5(7)       A6(8)
>             (2)  :B1(3)     B2(5)     B3(6)  :
>                  |---------|---------|-------|
>                  prog B
>                  0s                       28s
> 1. append A1;
> 2. append A2;
> 3. append B1 at +14s in;
> 4. append A4;
> 5. append B2 at +14s in;
> 6. append B3 at +14s in;
> 7. append A5;
> 8. append A6...
>
>
> in this example, main program segment A4 is overlapped by ad segments 
> B2 and B3.  this can be accommodated with the current 
> buffering/overlap model, but in a fairly unnatural way.  to achieve 
> the desired rendering, the append order must be [A1, A2, B1, A4, B2, 
> B3, A5, A6, ...] -- in other words, not in the natural playback order. 
>  every application will need to implement a segment overlap scheduler 
> to get this ordering right.  note also that there is a race with the 
> playback position vs the appends, where if you're running close to the 
> playback position, you might display a portion of the wrong program 
> (for example, missing the beginning of an ad or temporarily switching 
> back to the main program in the middle of the ad).
>
> this works for the ad insertion case because the advertiser will 
> typically want their entire ad played from beginning to end. for the 
> general "video editing" case, there's no way to come in to program B 
> at not-a-segment-boundary from program A not-a-segment-boundary, using 
> the current model.
>
> some months ago i did some experiments/proofs-of-concept with seamless 
> ad insertion at non-segment/non-keyframe boundaries in Flash Player 
> (built on top of the "appendBytes" APIs).  i had 4 simple primitives 
> that gave general editing capabilities in the natural segment playback 
> order, with no races (if data was late, playback would stall rather 
> than playing the wrong thing):
>
>   1) append segment data;
>   2) discontinuity;
>   3) stop appending from segment at time Te (until discontinuity);
>   4) after discontinuity, start playback from new segment at time Tb 
> (not necessarily at a keyframe, like a seek).
>
> for the ad insertion example above, this looks like:
>
>
> 0s               14s              0s         11s
>                  +-- cue B                   +-- cue A
> prog A           v                           v
> |-----------|----:XXXXXX|. . . . .|>>>>>>>>>>:---|-----------|-----------|
>  A1(1)       A2  :       A3(-)     A4(6)     :  A5(7)       A6(8)
>             (2)  :B1(3)     B2(4)     B3(5)  :
>                  |---------|---------|-------|
>                  prog B
>                  0s                       28s
>
> 1. append A1;
> 2.
>   2a. stop at 14s in (Te=14s);
>   2b. append A2;
> 3.
>   3a. discontinuity;
>   3b. start next segment 0s in (Tb=0s relative)
>   3c. append B1 at discontinuity;
> 4. append B2;
> 5. append B3;
> 6.
>   6a. discontinuity;
>   6b. start next segment 11s in (Tb=11s relative);
>   6c. append A4 (skipping ahead to 11s in) at discontinuity;
> 7. append A5;
> 8. append A6...
>
> note that this model could also support starting in on B at 
> not-the-beginning and ending at not-the-end, if that was desired.
>
> if it's the intention that ad insertion (and editing in general) 
> should always be at segment boundaries, then the complications i 
> described above go away and you can just append in the natural 
> playback order.  however, i believe real-world use scenarios 
> (especially ad insertion into live streams) will require seamless 
> splicing at not-segment-boundaries, requiring implementation of the 
> complicated scheduling and non-natural append order described above, 
> as well as exposure to possible races.  i believe it would be 
> advantageous to support this use case in a more natural way.
>
> -michael thornburgh
>
>


-- 
Cyril Concolato
Maître de Conférences/Associate Professor
Groupe Multimedia/Multimedia Group
Telecom ParisTech
46 rue Barrault
75 013 Paris, France
http://concolato.wp.mines-telecom.fr/
Received on Tuesday, 26 February 2013 08:58:11 UTC