Re: Resolving TextTrackCue issues

On Thu, Sep 5, 2013 at 1:03 AM, Cyril Concolato
<cyril.concolato@telecom-paristech.fr> wrote:
> Hi Silvia,
>
> It is a bit hard to follow this long discussion spread on this list, the
> blink-dev list, the bug tracker, ... I'll give my understanding in the hope
> that it helps and that it won't add more confusion.

Thanks. It's nice to see the requirements summarised by somebody else, too.


> My understanding is that we should distinguish the process which generates
> cues from the process that consumes the cues and draft the interface(s) with
> both processes in mind.
>
> There are 2 ways to generate cue objects:
>
> A. created by some JS code
> The content of the cue may be generated client-side or received from XHR.
> The format of the cue content may be anything: plain text, xml, binary data,
> base64 encoded or not. The data has at least a start time (possibly an end
> time) and should have an associated MIME type. Then you have 2 sub-cases:
>
>   A.1 The browser is capable of creating specific objects from the cue
> content following the MIME type (e.g. WebVTT Node objects, TTML objects,
> ...). In that case, there should be a way (for instance a dedicated
> interface) for a JS app to have the cue content parsed and have the objects
> created by the browser: i.e. if the content type of the cue I want to
> generate is text/CueFormatX, I will check if the browser supports the
> parsing of the CueFormatX, and call the parsing (via a constructor or
> another method) to get a specialized object and then access
> CueFormatX.propertyY if needed.

VTTCue satisfies this.

>   A.2 The browser is not capable of creating specific objects from the cue
> content (e.g. proprietary binary data) or the MIME type is unknown, the JS
> can use a generic constructor or method to store the timed cue content for
> later use.

VTTCue with @kind=metadata would satisfy this, but also the new
GenericCue interface for any @kind


> B. created by the browser
>   The content of the cues is generated and received, outside of a JS
> processing, from resources in a format that is understood by the browser
> (e.g. plain WebVTT files, TTML files, MP4 files, MPEG-2 TS, WebM, ...). Same
> as above, the browser will generate cue objects, ideally as much specialized
> as possible: i.e. if the resources is of type text/vtt, it should create
> VTTCue; or similar for text/CueFormatX.
>
> Then, there are 2 ways to consume the cue objects:

Recent discussion has exposed a third way to consume the cue objects:

E. The browser is able to convert the cue content to a format for
which it is able to produce a renderable representation. It basically
pretends to the JS developer that the parsed data is a WebVTT cue.


> C. The browser is capable of producing a renderable representation of the
> cue content (e.g. ideally there is a method (or equivalent)
> isRenderableTextTrack(mime) which returns true), then:
>   C.1 If the rendering is left to the browser natively, the track kind is
> set to subtitles or captions.

VTTCue provides for this. No other rendering algorithm for TextTrack
cues has been specified.

>   C.2 If the rendering needs to be altered by the JS, the track kind is set
> to metadata, the JS code calls getCueAsHTML when needed, the result is
> modified and displayed.

JS is able to get a HTML representation of VTTCue text content, but
why would there need to be a change of @kind ?


> D. The browser is not capable of producing a renderable representation of
> the cue content
>    The JS code should handle the rendering of the cue content from the given
> cue objects (specialized or not)

It's this use case D which is at the core of our discussion (assuming
you include parsing as part of rendering). The W3C spec proposal for
the GenericCue interface provides for cue content to be exposed by the
browser and rendered by JS, satisfying your use case D. However, there
is a position that if browsers are not capable of parsing and
rendering cue content, they should not expose it to JS at all - in
particular for captions and subtitles. If they won't, then we can
simply pretend everything is a WebVTT cue and when not rendered, it's
of @kind=metadata (even if it's actually caption content).


> Of course, you could mix how the cues are received with how they are
> rendered and have:
> - B+C (e.g. the browser supports parsing of WebVTT into cue nodes and the
> rendering)
> - or B+D (receiving an unknown track from an MP4 file (e.g. 3GPP Timed Text)
> and have JS conversion to WebVTT cues),
> - or A.1+C
> - or A.1+D
> - or A.2+D
> I don't see use cases for A.2+C: if a browser is not capable of creating
> specialized objects for a format it is probably not capable of rendering the
> cue.
>
> I don't have a clear opinion on which design is the best (new cue interfaces
> with/without constructor, methods on the texttrack interface, ...), but I
> would like to have all use cases possible. Is it the case with the W3C
> approach?

Yes.

> with the WhatWG approach?

Case D is not supported in the WHATWG approach.

> Could we compared example codes?

I can give you an example: if you have TTML in-band in MP4, it's
caption content, a browser has no parser and renderer for it, but can
in theory extract the cues from the MP4 encapsulation -

- the WHATWG spec would either not expose them to JS at all, or expect
them to be exposed as VTTCue objects with @kind=metadata

- the W3C spec as proposed on this thread would expose them to JS as
GenericCue objects with @kind=captions


HTH,
Silvia.


> HTH,
> Cyril
>
>
> Le 31/08/2013 09:26, Silvia Pfeiffer a écrit :
>
>> Hi all,
>>
>> Recent changes to the TextTrackCue interface had led to a fork with
>> the WHATWG spec [1] when resolving bug 21851 [2].
>>
>> This caused extensive discussion on blink-dev [3] when an intent to
>> implement was proposed.
>>
>> In the W3C WG we recognize the need for a generic cue interface type
>> with a constructor and a text attribute. It allows browsers to expose
>> cues in text tracks of video or audio files for which browsers don't
>> intend to implement parsers. It also allows JavaScript developers to
>> create time-synchronized data for media elements in any format they
>> require.
>>
>> The discussion on blink-dev exposed that the currently specified
>> solution of bug 21851 [2] in the HTML5 spec is flawed in several ways:
>>
>> (1) TextTrackCue objects that are not fully abstract create hard to
>> debug issues of backwards compatibility due to existing code that
>> assumes "new TextTrackCue()" constructs a cue with VTT semantics;
>> (2) in order to transition old TextTrackCue interface usage to "new
>> VTTCue()", it is better to remove the existing TextTrackCue
>> constructor causing hard failure (easily recognizable) instead of soft
>> failure (more difficult to recognize);
>> (3) the abstract TextTrackCue interface of the WHATWG is desirable for
>> extensibility to non-text-based cue interfaces of the future;
>> (4) the interface fork between the WHATWG and W3C spec should be removed.
>>
>> An alternative resolution to bug 21851 [2] has previously been
>> proposed and discussed: create a new interface that has the text
>> attribute and the constructor and inherits from the abstract
>> interface.
>>
>> This will result in the following interfaces:
>>
>> interface TextTrackCue : EventTarget {
>>    readonly attribute TextTrack? track;
>>
>>             attribute DOMString id;
>>             attribute double startTime;
>>             attribute double endTime;
>>             attribute boolean pauseOnExit;
>>
>>             attribute EventHandler onenter;
>>             attribute EventHandler onexit;
>> };
>>
>> [Constructor(double startTime, double endTime, DOMString text)]
>> interface GenericCue : TextTrackCue {
>>             attribute DOMString text;
>> };
>>
>> Whether VTTCue will inherit from GenericCue or from TextTrackCue will
>> be resolved in the TextTrack CG once this change has been applied to
>> the HTML5 spec.
>>
>> It is my understanding that this proposed change resolves all the
>> above listed issues. I will therefore apply these changes next week
>> unless there are any further concerns.
>>
>> Regards,
>> Silvia (as HTML spec editor).
>>
>> [1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=22903
>> [2] https://www.w3.org/Bugs/Public/show_bug.cgi?id=21851
>> [3]
>> https://groups.google.com/a/chromium.org/d/msg/blink-dev/-VHGnuNNUxM/Yibbv2TgDoYJ
>>
>
>
> --
> Cyril Concolato
> Maître de Conférences/Associate Professor
> Groupe Multimedia/Multimedia Group
> Telecom ParisTech
> 46 rue Barrault
> 75 013 Paris, France
> http://concolato.wp.mines-telecom.fr/
>
>

Received on Thursday, 5 September 2013 14:20:22 UTC