- From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Date: Fri, 23 Oct 2015 21:07:20 +1100
- To: Cyril Concolato <cyril.concolato@telecom-paristech.fr>
- Cc: public-texttracks@w3.org, Nigel Megitt <nigel.megitt@bbc.co.uk>
- Message-ID: <CAHp8n2=z_rOXWs3f4XyX2H=JJjA+azhsP7wWgAsH=cB9jqemVA@mail.gmail.com>
On 23 Oct 2015 7:50 pm, "Cyril Concolato" < cyril.concolato@telecom-paristech.fr> wrote: > > Le 22/10/2015 23:33, Silvia Pfeiffer a écrit : >> >> >> Just so we are clear: this already exists and works well for WebVTT. >> > I don't know what you mean by 'exists'. There is a pull request on-going but it is not (to my knowledge) implemented in any browser (is it?) or authoring tool, which means it can be removed if never implemented. With "this" I was referring to the creation of A sequences of indolent documents like requested by Nigel. I think you agree that Apple's use of WebVTT with HLS works that way. That's all I was referring to m >> >> It has limitations though such as having to repeat styles across the segments. Which is why we are discussing alternative approaches. HTH. >> > And an alternative approach might in the future replace or complement the current approach if the group feels like the limitations are too strong, no? Yes. I was merely replying to Nigel to explain why this discussion is happening. Cheers, Silvia. > Cyril >> >> >> Cheers, >> Silvia. >> >> Best Regards, >> Silvia. >> >> On 23 Oct 2015 12:52 am, "Nigel Megitt" <nigel.megitt@bbc.co.uk <mailto: nigel.megitt@bbc.co.uk>> wrote: >> >> It would also be possible to take the same approach with VTT as we >> have >> taken with TTML, which is that you have a sequence of independent >> documents each of which contains the styling etc needed to display >> itself, >> for whatever time period applies. Then you have something >> deliverable that >> will work, and you can separate out the problem of creating a >> single long >> document that contains "all the previous documents' content" into a >> different processing task. If you go down the route of timed >> styles then >> you're almost at that point anyway. >> >> Nigel >> >> >> On 22/10/2015 14:40, "Cyril Concolato" >> <cyril.concolato@telecom-paristech.fr >> <mailto:cyril.concolato@telecom-paristech.fr>> wrote: >> >> >Le 22/10/2015 13:36, Philip Jägenstedt a écrit : >> >> On Thu, Oct 22, 2015 at 10:47 AM, Cyril Concolato >> >> <cyril.concolato@telecom-paristech.fr >> <mailto:cyril.concolato@telecom-paristech.fr>> wrote: >> >>> Le 21/10/2015 15:39, Philip Jägenstedt a écrit : >> >>>> In the DASH/MP4/VTT software stack, is WebVTT the input or the >> >>>>output, and >> >>>> is it a file or a stream? AFAICT, the only issue would be with a >> >>>>WebVTT >> >>>> input stream (using the syntax in the spec, not any other >> framing) >> >>>>with >> >>>> STYLE blocks at the end, but since streaming standalone WebVTT >> >>>>doesn't exist >> >>>> yet I'm uncertain if that's really what you mean. >> >>> These are the good questions. It is currently possible to have a >> >>> never-ending WebVTT file being produced live, delivered over >> HTTP (e.g. >> >>> using chunked transfer encoding). Such WebVTT 'stream' cannot >> easily be >> >>> consumed by a browser today because the Streams API is not >> there yet, >> >>>but it >> >>> will be available in the future. Other (non-browser) WebVTT >> >>>implementations >> >>> can already use that today. This might require careful creation of >> >>>cues to >> >>> insure that each point is a random access, but that's possible >> today. >> >>> Several services can be done based on that: think of a >> WebRadio with >> >>> subtitling. Regarding MP4 packagine, an implementation could >> consume >> >>>such >> >>> stream and produces MP4 segments on the fly, if needed. >> >>> >> >>> For those implementations, if a new untimed style header would >> arrive >> >>>in the >> >>> input WebVTT stream and if such style would be defined to have >> effects >> >>>on >> >>> the whole 'file', i.e. including to cues prior in the 'file', then >> >>>playing >> >>> the live stream versus recording the stream and then playing a >> file >> >>>would >> >>> not have the same result. That would be problematic. That's why I >> >>>think that >> >>> styles should either be in the header (with semantics that >> they are >> >>>valid >> >>> for the whole file and without the ability to be in between >> cues) or >> >>>as a >> >>> timed block with a well defined time validity (like cues), or as >> >>>settings of >> >>> a cue. For the last two options, it really looks like WebVTT would >> >>>become a >> >>> multiplex of two types of timed data (cue and styles), I'm not >> sure we >> >>> should go in this direction and if a separate style file/stream >> >>>wouldn't be >> >>> better. >> >> Do you have a pointer to such a never-ending WebVTT file >> deployed on >> >> the public web? >> >No I don't, but that does not mean that it does not exist nor that we >> >should break such scenario. >> >> I honestly didn't think they would exist yet. >> >> >> >> To be pendantic, the reason that never-ending WebVTT files >> don't work >> >> in browsers isn't because of the Streams API, but because the media >> >> element's readyState cannot reach HAVE_FUTURE_DATA until the text >> >> tracks are ready: >> >> >> >> https://html.spec.whatwg.org/multipage/embedded-content.html#the-text-tra >> >>cks-are-ready >> >> >> >> This is what the spec bug is about, some mechanism to unblock >> >> readyState before text track parsing has finished: >> >> https://www.w3.org/Bugs/Public/show_bug.cgi?id=18029 >> >Sorry, I wasn't clear. I know about that bug. I was already assuming >> >that a web app would fetch the WebVTT (using XHR or fetch, retrieving >> >the text content as a stream), parse it and produce the cues in >> JS, not >> >at all using the native browser support, because of that exact bug. >> >> >> >> Anyway, letting the parser discard style blocks after any cues >> until >> >> we've figured out the live streaming issues is OK with me. However, >> >> let's spell out the implications of keeping this restriction >> for live >> >> streams: >> >I agree that it's the right approach. We should be aware of the >> >limitations of such approach. >> >> If you don't know all of the style up front, >> >I agree that "if you don't know all of the style up front" you have a >> >problem to solve. Nigel already pointed that out, as being useful in >> >broadcast where you don't necessarily know in advance all your >> styles. >> >To me, there are 2 main approaches: using timed styles or refreshing >> >untimed styles. >> > >> >By timed styles, I imagine something like: >> > >> >00:01:00.000 --> 00:02:00.000 type:style >> >.myRedClass { >> > color: red; >> >} >> >.myGreenClass { >> > color: green; >> >} >> > >> >00:01:00.000 --> 00:01:30.000 >> ><v.myGreenClass>Some green text >> > >> >00:01:20.000 --> 00:02:00.000 >> ><v.myRedClass>Some red text >> > >> >A cue of with a 'type' settings whose value is 'style' carries style >> >content not text content. This has the advantage of giving precise >> >timing for the styles, and we can force styles to appear in start >> time >> >order (like cues) and before a cue that has a similar start time. >> There >> >are probably problems with the syntax (blank lines in CSS, I did not >> >follow that part of the discussion). Also, if you want to have >> seekable >> >streams you probably would have to split cues to remove overlap >> (nothing >> >different from normal cues). >> > >> >Alternatively, I could also imagine something simpler like: >> >00:01:00.000 --> 00:01:30.000 style:color:green; >> >Some green text >> > >> >00:01:20.000 --> 00:02:00.000 style:color:red; >> >Some red text >> > >> >Maybe this could modified to import styles instead of inlining >> them, I >> >didn't think about that. Also, as I pointed out in my previous email, >> >such VTT file starts to become a multiplex with styles and >> content. It >> >may be more appropriate to define a Style stream (maybe using the >> WebVTT >> >syntax) and to link the style stream with the content stream, either >> >from the WebVTT content file or from an additional <track> element. >> >> your only >> >> recourse is to add a new text track at the point where new style is >> >> needed. >> >Without defining timed styles (as above), adding a new text track >> is an >> >option, but not the only one, you can use one text track and fill it >> >with cues coming from different WebVTT files. In the HLS >> approach, every >> >WebVTT segment would (re-)define its styles. That does not mean >> you have >> >to maintain multiple tracks. >> >> This will involve scripts, at which point handling multiple >> >> WebVTT tracks will compare unfavorably with just using a WebSocket >> >> connection to deliver cues and style using a custom syntax. >> >Maybe in some cases the WebSocket approach can be useful, but >> there are >> >other issues as well like caching. >> > >> >-- >> >Cyril Concolato >> >Multimedia Group / Telecom ParisTech >> >http://concolato.wp.mines-telecom.fr/ >> >@cconcolato >> > >> > >> > > > -- > Cyril Concolato > Multimedia Group / Telecom ParisTech > http://concolato.wp.mines-telecom.fr/ > @cconcolato >
Received on Friday, 23 October 2015 10:07:50 UTC