Re: MSE: Ad-insertion and seeking from David LaPalomento on 2015-09-08 (public-html-media@w3.org from September 2015)

From: David LaPalomento <dlapalomento@brightcove.com>
Date: Tue, 8 Sep 2015 10:04:47 -0400
To: Stephan Hesse <stephan.hesse@soundcloud.com>
Cc: Matt Wolenetz <wolenetz@google.com>, public-html-media@w3.org
Message-ID: <CACh87oeT4Lur=7-UAV6a4Ejaxg-apUWOCbkHvgViNwqhryn9Ew@mail.gmail.com>
Hi Stephan,
Thanks for the tip. I'm still hoping there's a cleaner solution but I
appreciate the experience and workaround.

On Mon, Sep 7, 2015 at 8:07 AM, Stephan Hesse <stephan.hesse@soundcloud.com>
wrote:

> On Fri, Aug 28, 2015 at 4:10 PM, David LaPalomento <
> dlapalomento@brightcove.com> wrote:
>
>> Maybe I'm misunderstanding the intended usage of the API. Is it possible
>> to seek to a later point in the video where there is no buffered range end
>> at the position we want to append the media?
>>
>
> From my experience that is possible with a trick: by appending padding
> data to the buffer up the point to which you eventually want to seek.
>
> Not sure if thats solving your problem, but for us it was necessary to be
> able to start playing from an arbitrary position of the track.
>
>
>> That's my big assumption going into this and why I'm unable to provide
>> "x" in your example.
>>
>> I thought putting some visuals together might clarify the problem:
>> https://github.com/dmlap/seeking-mse-example/blob/master/seeking-across-discontinuities.md.
>> While I was working on that, it occurred to me we could work around this by
>> removing all buffered regions ahead of the current media position when the
>> user seeks. We could end up doing some not-strictly-necessary re-buffering
>> but we'd avoid the possibility of content overlap. Assuming the answer to
>> my question in the first paragraph is "yes", does that solution sound like
>> the right approach to you?
>>
>> On Wed, Aug 26, 2015 at 8:42 PM, Matt Wolenetz <wolenetz@google.com>
>> wrote:
>>
>>> Hi David,
>>>
>>> #1 isn't a necessary prerequisite for #2 in your example, if I
>>> understand correctly. In fact, surprising behavior may occur if you
>>> overlap-append an existing buffered range right at, or near, currentTime in
>>> the timeline. Chrome, for example, attempts to play out the remainder of an
>>> overlapped GOP until the next keyframe in the newly appended media, but
>>> upon seeking back, may play more of the newer content in the
>>> overlapped-append region. This is an artifact of varying decoder pipeline
>>> depths.
>>>
>>> I am also confused why imprecise duration is the issue when calculating
>>> timestampOffset to close the gap. The app can reliably inspect
>>> SourceBuffer.buffered, and needs a reliably precise expected start
>>> timestamp="y" of the media it is about to append. Given those, it could set
>>> SourceBuffer.timestampOffset = ("x" = the time from
>>> SourceBuffer.buffered corresponding to the end time of the buffered range
>>> just prior to the point in the timeline where you want the new media
>>> actually to be appended) - ("y" = start timestamp of media that is
>>> about to be appended, from the bytestream or some other reliable source),
>>> and then append the media. This should cause all timestamps in the newly
>>> appended media to be adjusted. Assuming the newly appended media is a
>>> single continuously increasing-in-decode-timestamp sequence, this
>>> sequence's timestamps will be adjusted to move it to begin right at the
>>> desired time "x" in the timeline. If the sequence is discontinuous and the
>>> app wishes to collapse all gaps, it would need to append the media segments
>>> more granularly (each segment should be in DTS sequence and continuous),
>>> and adapt timestampOffset between each. Essentially, this is doing an
>>> approximated polyfill of "sequence" appendMode.
>>>
>>>
>>> On Wed, Aug 26, 2015 at 7:34 AM David LaPalomento <
>>> dlapalomento@brightcove.com> wrote:
>>>
>>>> Hi Matt,
>>>> Thanks for the response! "sequence" mode does sound like it could make
>>>> discontinuity handling less of a pain. Just to make sure I follow your
>>>> suggestion, tell me if this sounds right to you:
>>>>
>>>> 1) video.currentTime is set to a value that causes playback to cross
>>>> over a known timestamp discontinuity.
>>>> 2) The application selects a value for sourceBuffer.timestampOffset to
>>>> adjust the target media's timestamps to account for the discontinuity and
>>>> allow playback to begin.
>>>>
>>>> This works today in my testing so far. My problem happens a little bit
>>>> further on from this, though. Since I'm dealing with somewhat unreliable
>>>> third-party content, I can't be confident of their duration without
>>>> downloading and inspecting them. That leaves me with the tough choice of
>>>> taking a guess on the appropriate timestampOffset in step 2) and risk
>>>> overlapping content if the user seeks back and plays through the content
>>>> again; or downloading all intervening segments before the seek can complete
>>>> and forcing the user to sit through a painful amount of buffering.
>>>>
>>>> Basically, ad insertion requires dealing with third-party content and
>>>> (in my experience, at least) you can't rely on those parties for accurate
>>>> duration information to set timestampOffset. Does that make sense? Did I
>>>> miss something from your advice?
>>>>
>>>> On Tue, Aug 25, 2015 at 5:43 PM, Matt Wolenetz <wolenetz@google.com>
>>>> wrote:
>>>>
>>>>> Hi David,
>>>>>
>>>>> I am one of the current co-editors of the MSE spec. Thanks for your
>>>>> question.
>>>>>
>>>>> This appears to be a problem that "sequence" appendMode may alleviate:
>>>>> it collapses all discontinuities into one continuous buffered region, so
>>>>> long as there are no other intervening operations like explicitly changing
>>>>> timestampOffset or appendMode. This "sequence" appendMode is in
>>>>> experimental support currently in Chrome M46 (it's hidden behind an
>>>>> experimental flag). The caveat for using sequence appendMode (beyond, of
>>>>> course, having implementations in user agents) is that the appends must be
>>>>> done in the order they are desired on the timeline, not in a scattered
>>>>> fashion.
>>>>>
>>>>> While implementations are pending "sequence" appendMode support,
>>>>> explicitly updating timestampOffset to collapse potentential
>>>>> discontinuities would be feasible if you:
>>>>> 1) know the current SourceBuffer.buffered() range(s) end time(s): this
>>>>> is available in MSE.
>>>>> 2) know the start timestamp of media about to be appended (by
>>>>> inspection offline, or even in a js parser)
>>>>>
>>>>> Combined, these are like timestamp rewriting, except the rewriting is
>>>>> done implicitly by timestampOffset, rather than updating the timecodes in
>>>>> the appended byte stream.
>>>>>
>>>>> Is the latter method (explicitly updating timestampOffset using data
>>>>> from the API and from the byte stream (or offline inspection/metadata/some
>>>>> other assumption)) sufficient for your use case?
>>>>>
>>>>> Matt
>>>>>
>>>>> On Tue, Aug 25, 2015 at 7:53 AM David LaPalomento <
>>>>> dlapalomento@brightcove.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>> I'm a contributor to video.js and am working to convert an existing
>>>>>> Flash-based HLS playback plugin to using Media Source Extensions. We
>>>>>> support a number of server-side ad insertion services, all of which seem to
>>>>>> compose existing media files without doing timestamp rewriting and signal
>>>>>> this to the player through metadata in the HLS manifest.
>>>>>>
>>>>>> One of the technical challenges we've faced in the existing
>>>>>> implementation is handling seeking across multiple timestamp
>>>>>> discontinuities before the entire video has been downloaded. HLS v3 rounds
>>>>>> segment durations to the nearest whole number which can introduce a
>>>>>> significant amount of timeline error in long-form content. Ignoring the
>>>>>> shortcomings of HLS though, the duration values provided by ad-insertion
>>>>>> services may lack precision and the wild-west of ad creatives doesn't help
>>>>>> the situation.
>>>>>>
>>>>>> We handle this issue today by recalculating the media timeline
>>>>>> whenever a new segment is downloaded and processed. Since the buffer always
>>>>>> grows forward, media timeline adjustments occur ahead of the current
>>>>>> playback position and the player's media timeline converges on reality as
>>>>>> more content is buffered.
>>>>>>
>>>>>> Preamble out of the way, here's my question for this group: how would
>>>>>> one seek across discontinuities without frame-accurate durations using
>>>>>> Media Source Extensions? If we had perfectly accurate duration information,
>>>>>> I believe we could use timestamp offsets on the source buffer to place the
>>>>>> new content at the appropriate position. With inaccurate or low-precision
>>>>>> duration information, it seems like we run the risk of mis-placing the
>>>>>> media data and creating overlaps at discontinuities and misreporting the
>>>>>> total content duration. Is there a solution in the spec I'm missing?
>>>>>>
>>>>>
>>>>
>>
>
>
> --
>
> Stephan Hesse
>
> Playback & Delivery Engineer
>
>
> http://soundcloud.com/tchakabam
> http://twitter.com/tchakabam
> Blog/Website: http://www.dispar.at
> Skype: stephan.hesse.1985
>
>
> SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany | +49
> (0)151 230 237 32
>
> Managing Director: Alexander Ljung | Incorporated in England & Wales with Company
> No. 6343600 | Local Branch Office | AG Charlottenburg  | HRB 110657B
>
>
>
> Capture and share your music & audio on SoundCloud
> <http://soundcloud.com/creators>
>
Received on Tuesday, 8 September 2015 14:05:18 UTC