- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Sat, 28 Jul 2012 17:49:04 +1000
- To: Cyril Concolato <cyril.concolato@telecom-paristech.fr>
- Cc: Glenn Maynard <glenn@zewt.org>, public-texttracks@w3.org
Hi Cyril, On Fri, Jul 27, 2012 at 7:30 PM, Cyril Concolato <cyril.concolato@telecom-paristech.fr> wrote: > > I don't *want* to discard the captions. The client has just not received > them. For the client not to receive them, the server has to not send them. This means, the server - which has all the cues in the WebVTT file - has to create (i.e. encode) a new WebVTT file from the current one, which starts only with the cues from the random access point onwards. (i.e. it discards the previous cues) > I agree to pass the signature, header, unmodified cues, but that is > not sufficient to produce the same result as if the client had joined from > the beginning. The client should know how long it has to wait until the > result becomes as if it had joined from the beginning. That's called the > roll distance in MP4 but that does not exist in other transport formats. > That's why you should be able to create specific RAP, if needed. > > Take the following example: > cue 1 with a start=10, dur=10, text on line 1: > |---------------------------| > cue 2 with a start=15, dur=10, text on line 2: > |---------------------------| > cue 3 with a start=21, dur=10, text on line 1: > |---------------------------| > > If the client connects at T=12, it will receive the content of cue 2, and > not having received the content of cue 1, some clients could display > something between T=15 and T=20. That's not what the server should do. It would have a list of all the active cues at T=12, which includes cue1, and thus it would send from cue1 onwards. > This will be partial text and maybe > incorrect. Some other clients could decide to wait until cue 3 is received. > But these clients have no guarantee that another line of text has not been > set by a previous cue. Indeed: the clients can't do anything about what they receive. It's the server that has to send the correct data. Only the server will know at what time a client connects and what video frames, audio frames, and cues are active at that time and need to be sent out. > That's what the signaling of RAP or roll distance is > for. > > It would be good, as it is the case in most codecs, to be able to prepare > the content, such that from time to time there is a RAP. One could prepare > the content in the following way (that might not be the only option): > cue 1 with a start=10, dur= 5, text on line 1: > |---------------------------| /* duration could be reduced in time*/ > cue 2 with a start=15, dur=10, text on line 1 & 2, RAP: > |------------| > cue 2 with a start=20, dur= 5, text on line 2: > |------------| > cue 3 with a start=21, dur=10, text on line 1: > |---------------------------| WebVTT is a text format. RAP into a text format is trivial. If your problem is with files that have WebVTT encapsulated together with audio and video packets, then indeed you may need to find a way to multiplex the file such that e.g. WebVTT cues that last for "a long time" get repeated (maybe in sync with the video's I frames) or for simple a way to locate the currently still active WebVTT cues from the encoding information. That's a problem that we had to solve for text tracks in Ogg (see granulerate in http://svn.annodex.net/standards/draft-pfeiffer-cmml-current.txt). It's also something that had to be considered for WebVTT in WebM (http://wiki.webmproject.org/webm-metadata/temporal-metadata/webvtt-in-webm - IIUC cues are placed on the same cluster as the video frames). > In my first email, I was suggesting a possible way (allowing cue settings as > CSS properties and using top-level spans) but this might be problematic, I > don't know. I don't care about the solution, but I think the requirement is > valid. The requirement is valid. The solutions is, however, trivial for a text file. It's not trivial when multiplexed with media data, but that is a different problem. HTH. Cheers, Silvia.
Received on Saturday, 28 July 2012 07:49:52 UTC