- From: Nigel Megitt <nigel.megitt@bbc.co.uk>
- Date: Wed, 21 Oct 2015 14:30:41 +0000
- To: David Singer <singer@apple.com>
- CC: Philip Jägenstedt <philipj@opera.com>, Cyril Concolato <cyril.concolato@telecom-paristech.fr>, "public-texttracks@w3.org" <public-texttracks@w3.org>
On 21/10/2015 14:54, "singer@apple.com on behalf of David Singer" <singer@apple.com> wrote: > >> On Oct 21, 2015, at 15:37 , Nigel Megitt <nigel.megitt@bbc.co.uk> wrote: >> >> On 21/10/2015 14:00, "singer@apple.com on behalf of David Singer" >> <singer@apple.com> wrote: >> >>> >>> No, it’s not hypothetical. DASH/MP4/VTT relies on this, and it was >>>(and >>> is) seen as a core advantage of VTT over TTML. >> >> How curious. Live streaming with DASH/MP4/TTML works splendidly - there >> were lots of implementations on show at IBC in September, of both coders >> and presentation systems, based on EBU-TT-D, which is the profile of >>TTML >> that is specified for HbbTV 2.0 and the DVB DASH profile. The dash.js >> player is one. Samsung had a prototype television that was decoding and >> presenting this format too - I'm pretty sure that there are others in >>the >> works. The BBC has prototyped an implementation built on gstreamer that >> works well also. >> >> What advantage was identified with VTT in this scenario? > >Flexible granularity is one. Live streaming of TTML means short TTML >documents each of which describe a time interval. This means your segment >size is basically that, or a multiple of it, and that is also then your >minimum latency. I don't think that segment size is equivalent to latency. If you mean that segment duration sets a minimum latency, there are other things to consider. Typically in the DASH case there's an encoding and packaging layer that accumulates the content to be streamed, segments it, encodes those segments, packages them and then sends them to a distribution network. In every case I've seen the video encoding latency is much greater than the subtitle/caption encoding latency, and the choice of segment duration is based on what works well for video without having any impact on the latency for audio or subtitles/captions. For example, if the encoding pipeline introduces a 16-26s delay to encode 10s long segments (i.e. the delay is 26s for the earliest frame in the segment and 16s for the latest frame) then you need your live subtitle encoding pipeline to be able to accumulate and encode subtitle/caption segments in less time than that. Typically in the UK live broadcast subtitles are 6-10s later than the broadcast video. So even if you were hypothetically (and illegally!) doing an off-air receive and encode in this example you'd still have to insert a delay in the live subtitles to get them not to appear too early, assuming you choose a 10s segment size for subtitles too - in practice you could choose an even longer segment size if you wanted. I've just looked at our live BBC News HD service subtitles, which are updated on every word, and the typical time between subtitle updates is around 0.2 seconds. This is so much less than the sort of segment duration I'd expect that there's no significant interaction between the two. Actually there's no interaction at all. Worst case scenario is that a subtitle that briefly appears at the end of one segment and the beginning of the next is duplicated in each segment. The visible appearance and latency are not impacted. I've used made up but vaguely realistic numbers here, but in every case I know of it takes longer to encode video than subtitles/captions, so this scales back to lower latencies too. The other side of this is: what happens if you don't need to worry about video encoding? I have a prototype live TTML streaming system that I can show anyone who is interested at TPAC that transfers TTML documents using WebSocket. The lesson from that work is that, as long as you have control over the network paths so that TCP doesn't trade delay for reliability then this works faster than you can think without imposing any latency caused by the document format. The latency is all caused by the network, and the data rates aren't so high that it really matters these days. I wouldn't distribute subtitles to thousands of subscribers over the internet using that mechanism, but as a contribution mechanism, e.g. to a DASH encoder/packager in a closed environment, it would work very well. If you don't like TCP then I'm sure that e.g. RTP would work just fine too, with different trade-offs. Nigel > > >David Singer >Manager, Software Standards, Apple Inc. >
Received on Wednesday, 21 October 2015 14:31:36 UTC