Re: MSE: Ad-insertion and seeking from Stephan Hesse on 2015-09-08 (public-html-media@w3.org from September 2015)

From: Stephan Hesse <stephan.hesse@soundcloud.com>
Date: Tue, 8 Sep 2015 18:04:52 +0200
To: David LaPalomento <dlapalomento@brightcove.com>
Cc: Matt Wolenetz <wolenetz@google.com>, public-html-media@w3.org
Message-ID: <CAAh6mT3E89hO6yTnJTDfzH7__ea0uxLCKhq6U_mdVyqOJEGFTg@mail.gmail.com>
So did that do the trick for you? I found myself having to write a whole
mechanics around MediaSource API to actually be able to conveniently attach
it to our streaming client.

Its also because of the "updated" state stuff etc. Basically as you media
segments are loaded async, your source buffer is not necessarily ready to
take them (and they might even come from a cache, so with close to zero
latency and maybe out of order). So one needs to queue the data before one
inserts it into the buffer. And maybe even pre-queue it to actually make
sure it gets inserted in the buffer in the right order so you don't hit
"holes" while playing. That sounds familiar from other media frameworks, it
results in some kind of pipeline. Then the thing with the padding needed to
be done to start playing with an offset is making it even more complex.

Do you guys think it would make sense to write some higher-level
abstraction around MediaSource API, or do you know some promising work done
here? I have seen there is a media-source contrib to video-js, isn't it
doing something likely? I would definitely be interested to share efforts
here, as there are a lot of common problems to be solved before anybody can
use this API robustly I think..

The API exposes all the async-ness of whats going on there, but eventually
one just wants to pipeline-arch the stuff, push in data on one side, maybe
process it somehow, and play it on the other.


On Tue, Sep 8, 2015 at 4:04 PM, David LaPalomento <
dlapalomento@brightcove.com> wrote:

> Hi Stephan,
> Thanks for the tip. I'm still hoping there's a cleaner solution but I
> appreciate the experience and workaround.
>
> On Mon, Sep 7, 2015 at 8:07 AM, Stephan Hesse <
> stephan.hesse@soundcloud.com> wrote:
>
>> On Fri, Aug 28, 2015 at 4:10 PM, David LaPalomento <
>> dlapalomento@brightcove.com> wrote:
>>
>>> Maybe I'm misunderstanding the intended usage of the API. Is it possible
>>> to seek to a later point in the video where there is no buffered range end
>>> at the position we want to append the media?
>>>
>>
>> From my experience that is possible with a trick: by appending padding
>> data to the buffer up the point to which you eventually want to seek.
>>
>> Not sure if thats solving your problem, but for us it was necessary to be
>> able to start playing from an arbitrary position of the track.
>>
>>
>>> That's my big assumption going into this and why I'm unable to provide
>>> "x" in your example.
>>>
>>> I thought putting some visuals together might clarify the problem:
>>> https://github.com/dmlap/seeking-mse-example/blob/master/seeking-across-discontinuities.md.
>>> While I was working on that, it occurred to me we could work around this by
>>> removing all buffered regions ahead of the current media position when the
>>> user seeks. We could end up doing some not-strictly-necessary re-buffering
>>> but we'd avoid the possibility of content overlap. Assuming the answer to
>>> my question in the first paragraph is "yes", does that solution sound like
>>> the right approach to you?
>>>
>>> On Wed, Aug 26, 2015 at 8:42 PM, Matt Wolenetz <wolenetz@google.com>
>>> wrote:
>>>
>>>> Hi David,
>>>>
>>>> #1 isn't a necessary prerequisite for #2 in your example, if I
>>>> understand correctly. In fact, surprising behavior may occur if you
>>>> overlap-append an existing buffered range right at, or near, currentTime in
>>>> the timeline. Chrome, for example, attempts to play out the remainder of an
>>>> overlapped GOP until the next keyframe in the newly appended media, but
>>>> upon seeking back, may play more of the newer content in the
>>>> overlapped-append region. This is an artifact of varying decoder pipeline
>>>> depths.
>>>>
>>>> I am also confused why imprecise duration is the issue when calculating
>>>> timestampOffset to close the gap. The app can reliably inspect
>>>> SourceBuffer.buffered, and needs a reliably precise expected start
>>>> timestamp="y" of the media it is about to append. Given those, it could set
>>>> SourceBuffer.timestampOffset = ("x" = the time from
>>>> SourceBuffer.buffered corresponding to the end time of the buffered range
>>>> just prior to the point in the timeline where you want the new media
>>>> actually to be appended) - ("y" = start timestamp of media that is
>>>> about to be appended, from the bytestream or some other reliable source),
>>>> and then append the media. This should cause all timestamps in the newly
>>>> appended media to be adjusted. Assuming the newly appended media is a
>>>> single continuously increasing-in-decode-timestamp sequence, this
>>>> sequence's timestamps will be adjusted to move it to begin right at the
>>>> desired time "x" in the timeline. If the sequence is discontinuous and the
>>>> app wishes to collapse all gaps, it would need to append the media segments
>>>> more granularly (each segment should be in DTS sequence and continuous),
>>>> and adapt timestampOffset between each. Essentially, this is doing an
>>>> approximated polyfill of "sequence" appendMode.
>>>>
>>>>
>>>> On Wed, Aug 26, 2015 at 7:34 AM David LaPalomento <
>>>> dlapalomento@brightcove.com> wrote:
>>>>
>>>>> Hi Matt,
>>>>> Thanks for the response! "sequence" mode does sound like it could make
>>>>> discontinuity handling less of a pain. Just to make sure I follow your
>>>>> suggestion, tell me if this sounds right to you:
>>>>>
>>>>> 1) video.currentTime is set to a value that causes playback to cross
>>>>> over a known timestamp discontinuity.
>>>>> 2) The application selects a value for sourceBuffer.timestampOffset to
>>>>> adjust the target media's timestamps to account for the discontinuity and
>>>>> allow playback to begin.
>>>>>
>>>>> This works today in my testing so far. My problem happens a little bit
>>>>> further on from this, though. Since I'm dealing with somewhat unreliable
>>>>> third-party content, I can't be confident of their duration without
>>>>> downloading and inspecting them. That leaves me with the tough choice of
>>>>> taking a guess on the appropriate timestampOffset in step 2) and risk
>>>>> overlapping content if the user seeks back and plays through the content
>>>>> again; or downloading all intervening segments before the seek can complete
>>>>> and forcing the user to sit through a painful amount of buffering.
>>>>>
>>>>> Basically, ad insertion requires dealing with third-party content and
>>>>> (in my experience, at least) you can't rely on those parties for accurate
>>>>> duration information to set timestampOffset. Does that make sense? Did I
>>>>> miss something from your advice?
>>>>>
>>>>> On Tue, Aug 25, 2015 at 5:43 PM, Matt Wolenetz <wolenetz@google.com>
>>>>> wrote:
>>>>>
>>>>>> Hi David,
>>>>>>
>>>>>> I am one of the current co-editors of the MSE spec. Thanks for your
>>>>>> question.
>>>>>>
>>>>>> This appears to be a problem that "sequence" appendMode may
>>>>>> alleviate: it collapses all discontinuities into one continuous buffered
>>>>>> region, so long as there are no other intervening operations like
>>>>>> explicitly changing timestampOffset or appendMode. This "sequence"
>>>>>> appendMode is in experimental support currently in Chrome M46 (it's hidden
>>>>>> behind an experimental flag). The caveat for using sequence appendMode
>>>>>> (beyond, of course, having implementations in user agents) is that the
>>>>>> appends must be done in the order they are desired on the timeline, not in
>>>>>> a scattered fashion.
>>>>>>
>>>>>> While implementations are pending "sequence" appendMode support,
>>>>>> explicitly updating timestampOffset to collapse potentential
>>>>>> discontinuities would be feasible if you:
>>>>>> 1) know the current SourceBuffer.buffered() range(s) end time(s):
>>>>>> this is available in MSE.
>>>>>> 2) know the start timestamp of media about to be appended (by
>>>>>> inspection offline, or even in a js parser)
>>>>>>
>>>>>> Combined, these are like timestamp rewriting, except the rewriting is
>>>>>> done implicitly by timestampOffset, rather than updating the timecodes in
>>>>>> the appended byte stream.
>>>>>>
>>>>>> Is the latter method (explicitly updating timestampOffset using data
>>>>>> from the API and from the byte stream (or offline inspection/metadata/some
>>>>>> other assumption)) sufficient for your use case?
>>>>>>
>>>>>> Matt
>>>>>>
>>>>>> On Tue, Aug 25, 2015 at 7:53 AM David LaPalomento <
>>>>>> dlapalomento@brightcove.com> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>> I'm a contributor to video.js and am working to convert an existing
>>>>>>> Flash-based HLS playback plugin to using Media Source Extensions. We
>>>>>>> support a number of server-side ad insertion services, all of which seem to
>>>>>>> compose existing media files without doing timestamp rewriting and signal
>>>>>>> this to the player through metadata in the HLS manifest.
>>>>>>>
>>>>>>> One of the technical challenges we've faced in the existing
>>>>>>> implementation is handling seeking across multiple timestamp
>>>>>>> discontinuities before the entire video has been downloaded. HLS v3 rounds
>>>>>>> segment durations to the nearest whole number which can introduce a
>>>>>>> significant amount of timeline error in long-form content. Ignoring the
>>>>>>> shortcomings of HLS though, the duration values provided by ad-insertion
>>>>>>> services may lack precision and the wild-west of ad creatives doesn't help
>>>>>>> the situation.
>>>>>>>
>>>>>>> We handle this issue today by recalculating the media timeline
>>>>>>> whenever a new segment is downloaded and processed. Since the buffer always
>>>>>>> grows forward, media timeline adjustments occur ahead of the current
>>>>>>> playback position and the player's media timeline converges on reality as
>>>>>>> more content is buffered.
>>>>>>>
>>>>>>> Preamble out of the way, here's my question for this group: how
>>>>>>> would one seek across discontinuities without frame-accurate durations
>>>>>>> using Media Source Extensions? If we had perfectly accurate duration
>>>>>>> information, I believe we could use timestamp offsets on the source buffer
>>>>>>> to place the new content at the appropriate position. With inaccurate or
>>>>>>> low-precision duration information, it seems like we run the risk of
>>>>>>> mis-placing the media data and creating overlaps at discontinuities and
>>>>>>> misreporting the total content duration. Is there a solution in the spec
>>>>>>> I'm missing?
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>>
>> --
>>
>> Stephan Hesse
>>
>> Playback & Delivery Engineer
>>
>>
>> http://soundcloud.com/tchakabam
>> http://twitter.com/tchakabam
>> Blog/Website: http://www.dispar.at
>> Skype: stephan.hesse.1985
>>
>>
>> SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany | +49
>> (0)151 230 237 32
>>
>> Managing Director: Alexander Ljung | Incorporated in England & Wales
>> with Company No. 6343600 | Local Branch Office | AG Charlottenburg  | HRB
>> 110657B
>>
>>
>>
>> Capture and share your music & audio on SoundCloud
>> <http://soundcloud.com/creators>
>>
>
>


-- 

Stephan Hesse

Playback & Delivery Engineer


http://soundcloud.com/tchakabam
http://twitter.com/tchakabam
Blog/Website: http://www.dispar.at
Skype: stephan.hesse.1985


SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany | +49
(0)151 230 237 32

Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company
No. 6343600 | Local Branch Office | AG Charlottenburg  | HRB 110657B



Capture and share your music & audio on SoundCloud
<http://soundcloud.com/creators>
Received on Tuesday, 8 September 2015 16:05:27 UTC