Re: [whatwg] Comments about the track element from Silvia Pfeiffer on 2012-07-26 (public-whatwg-archive@w3.org from July 2012)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Fri, 27 Jul 2012 08:35:01 +1000
To: Cyril Concolato <cyril.concolato@telecom-paristech.fr>
Cc: whatwg@lists.whatwg.org
Message-ID: <CAHp8n2kv-n7zALwZZg2M+E3ZHEtzu4r466+uQ+SAw+qih36CFA@mail.gmail.com>
Hi Cyril,

On Thu, Jul 26, 2012 at 10:03 PM, Cyril Concolato
<cyril.concolato@telecom-paristech.fr> wrote:
>> What do you mean here by "positioning issues"? SVG handles the positioning
>> within its viewbox and what I propose is to define the size and position of
>> this viewbox in the parent coordinate system, i.e. with respect to the
>> video. I don't see what else is needed? or do you mean when SVG is
>> transported in cue, how do you use the cue settings?

There is the SVG viewbox and there is the video viewbox. It is not
immediately clear how they relate to each other. What I meant was: how
to position the SVG viewbox within the boundaries of the video
viewbox. It could fully cover it, but it may not need to. For example
in your example with the clock, it could be positioned by coordinates
of the video, e.g. left: 70%, top:30% or something like it. Then the
SVG can be much smaller and it is possible to overlay other elements,
too.

> Do you mean that you would like to have some signaling in the WebVTT file
> (for instance in the header) to indicate the type of the cue payload? I
> think that'll be interesting.

Yes, we have a proposal for a metadata field in the WebVTT header to
signify the kind.

> Otherwise, it'll be interesting to have a type
> selector in the validator.

That can work, too, of course.


>> TTML in WebVTT probably doesn't make sense. But SVG's timing model can
>> be a applied within the timeframe of a cue, so that does make sense.
>
> Maybe, yes. It might make sense if your cue has a long duration, otherwise
> the overhead of loading an SVG document for each cue might be too big. But
> in general, since you can structure an SVG document with a frame-based
> structure (see this cartoon for instance:
> http://perso.telecom-paristech.fr/~concolat/SVG/flash10.svg), I don't see
> the added value of WebVTT to carry SVG.

Indeed, for this kind of use case, putting SVG in WebVTT makes no sense.

You could, however, put SVG in WebVTT e.g. to provide overlay graphics
that are non-moving or are in a loop for a certain duration of the
video. E.g. an animated character (like your Rhino) could be rendered
in a loop on top of a video for the first 3 minutes of the video.

>> How would you specify this with TTML? It would run into the same
>> problems, wouldn't it?
>
> I think so, the problems would be similar. But again, TTML can also express
> frame-based animations, why should you add the WebVTT layer?

I don't want to take this discussion off track, but it is news to me
that TTML can express frame-based animations.
I indeed wouldn't mingle WebVTT and TTML layering since they satisfy
the same use cases.


>> What would your preferred markup for
>> http://perso.telecom-paristech.fr/~concolat/html5_tests/svg.vtt be ?
>> How would you avoid the duplication?
>
> For instance, you would want to be able to construct the SVG document
> progressively, to have only one document that you modify by adding more
> data. One way to do it would be to have the first cue contain the beginning
> of the document and the following cues contain more data, but since
> modifying the document after its load is tricky, this would require
> concatenating all previous cue texts and then parsing that as a new document
> (ugly!). I'd like to have the parsing step done under the hood by the
> browser, as it usually do.

How does the browser support constructing SVG progressively right now?
If there is a SVG-internal solution, that should be used. In this
case, @mediagroup synchronization would again make the most sense. Or
you just do everything in SVG.


> If you try my example here
> (http://perso.telecom-paristech.fr/~concolat/html5_tests/getcueasSVG.html),
> you'll see that changing the playback speed (even to 0.1) does not guarantee
> synchronization either. By the time the JS has processed the content, it's
> already too late. It might be an implementation issue but it's symptomatic
> of the stacking, that's why I think we should leverage the native parsing,
> synchronization and support for SVG rendering (not through JS). The clock
> might be a (not so) extreme case, but I don't think I'm trying to do very
> fancy things here, just trying to reproduce existing technologies
> (proprietary or not) with existing web standards.

Sure.

>> I'm not sure. Having to repeatedly parse WebVTT cues and draw the SVG
>> image makes this particularly slow. Have you tried to paint the SVG
>> just once on the video and using TextTrackCues just to change the
>> transform value using JavaScript? Upon a cuechange event, you re-draw
>> the SVG.
>
> I could give it a try if I have some time but I'm not really sure I
> understand what you're suggesting. Do you mean using addCue? Could you give
> an example? Are you suggesting something similar to the example in the spec
> with
>
> var sounds = sfx.addTextTrack('metadata');

No, not really. What I meant was to draw the blue handle on top of the
video not through cues, but directly in the browser. Then, the WebVTT
file only delivers the according position changes for that particular
time and all you need to do in JavaScript is to change the handle
position in the SVG. That makes the WebVTT slimmer and not contain any
SVG at all.


>> Sorry for the confusion here. I didn't mean to replicate the SVG APIs here
>> but I just meant that the TextTrack API is very specific to 'pure' text
>> tracks (and even to WebVTT text tracks). You might want to expose the SVG
>> API when SVG content is used for the overlay to control it.
>
> Can you make an example? How do you think that should look?
>
> I was thinking of having something like the following. Pardon my IDL
> mistakes. Also note that it is not really a proposal, I haven't thought
> thoroughly of all the consequences, but it is just to give an idea.
>
> enum TextTrackMode { "disabled", "hidden", "showing" };
> interface Track : EventTarget {
>   readonly attribute DOMString kind;
>   readonly attribute DOMString label;
>   readonly attribute DOMString language;
>   readonly attribute DOMString inBandMetadataTrackDispatchType;
>
>            attribute TextTrackMode mode;
>
> };
>
> interface TextTrack : Track {
>   readonly attribute TextTrackCueList? cues;
>   readonly attribute TextTrackCueList? activeCues;
>
>   void addCue(TextTrackCue cue);
>   void removeCue(TextTrackCue cue);
>
>            attribute EventHandler oncuechange;
>
> };
>
> interface GraphicsDocumentTrack : Track {
>            attribute Document trackDocument;
> };
>
> The basic Track interface would be almost the same as the VideoTrack or
> AudioTrack. The GraphicsDocumentTrack interface would be used for tracks
> which have an underlying document (TTML, SVG, SMIL?, HTML?...) with a visual
> representation and not necessarily based on cues. For these documents, you
> are not interested in cues or cue changes (and it might not make sense to
> define cues). For these, you're only interested in:
> - the dispatch of the track content to the parser being done automatically
> by the browser (no need to use a JS DOMParser);
> - the rendering of the underlying document being synchronized (natively) by
> the browser, i.e. the timeline of the underlying document should be locked
> with the timeline of the video (or audio). No need to monitor cue changes to
> render the right SVG.
> You could discriminate between a TextTrack and a GraphicsDocumentTrack by a
> mime type or the inBandMetadataTrackDispatchType (not sure...). When the
> track carries SVG, the trackDocument object could be an SVGDocument. This
> would allow for controlling the SVG as if it was embedded in the HTML but
> for the synchronization done by the browser. What do you think?

Why does it have to be a track at all? Video and audio can be
synchronized to each other without one needing to be a track of the
other. To use @mediagroup, you might need to consider what an SVG
graphic has to provide for the MediaController [1]. There is no
need to consider cues and tracks - we seem to agree on that. :-)

Cheers,
Silvia.


[1] http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#mediacontroller
Received on Thursday, 26 July 2012 22:35:52 UTC