Re: Resolving TextTrackCue issues

Hi Silvia,

Le 05/09/2013 16:19, Silvia Pfeiffer a écrit :
> On Thu, Sep 5, 2013 at 1:03 AM, Cyril Concolato
> <> wrote:
>>    A.2 The browser is not capable of creating specific objects from the cue
>> content (e.g. proprietary binary data) or the MIME type is unknown, the JS
>> can use a generic constructor or method to store the timed cue content for
>> later use.
> VTTCue with @kind=metadata would satisfy this, but also the new
> GenericCue interface for any @kind
There shouldn't be two ways to do the same thing.

>> B. created by the browser
>>    The content of the cues is generated and received, outside of a JS
>> processing, from resources in a format that is understood by the browser
>> (e.g. plain WebVTT files, TTML files, MP4 files, MPEG-2 TS, WebM, ...). Same
>> as above, the browser will generate cue objects, ideally as much specialized
>> as possible: i.e. if the resources is of type text/vtt, it should create
>> VTTCue; or similar for text/CueFormatX.
>> Then, there are 2 ways to consume the cue objects:
> Recent discussion has exposed a third way to consume the cue objects:
> E. The browser is able to convert the cue content to a format for
> which it is able to produce a renderable representation. It basically
> pretends to the JS developer that the parsed data is a WebVTT cue.
I assume you mean't to *create* the cue objects. If this is the case I 
>> C. The browser is capable of producing a renderable representation of the
>> cue content (e.g. ideally there is a method (or equivalent)
>> isRenderableTextTrack(mime) which returns true), then:
>>    C.1 If the rendering is left to the browser natively, the track kind is
>> set to subtitles or captions.
> VTTCue provides for this. No other rendering algorithm for TextTrack
> cues has been specified.
>>    C.2 If the rendering needs to be altered by the JS, the track kind is set
>> to metadata, the JS code calls getCueAsHTML when needed, the result is
>> modified and displayed.
> JS is able to get a HTML representation of VTTCue text content, but
> why would there need to be a change of @kind ?
Tracks which are not of type 'metadata' will be exposed automatically by 
the browser in the built-in GUI and you don't want that if you want to 
tweak the track content before it is rendered.

>> Could we compared example codes?
> I can give you an example: if you have TTML in-band in MP4, it's
> caption content, a browser has no parser and renderer for it, but can
> in theory extract the cues from the MP4 encapsulation -
> - the WHATWG spec would either not expose them to JS at all, or expect
> them to be exposed as VTTCue objects with @kind=metadata
> - the W3C spec as proposed on this thread would expose them to JS as
> GenericCue objects with @kind=captions
Reading the rest of the thread, Glenn's comments make it clear that you 
don't want to make TTML documents be treated by a VTT Parser as VTT 
metadata cue content (encoding issue, CR/LF issue, confusion, loss of 
@kind value) but that's not a reason to have 2 separate interfaces. If 
the @cueHintFormat/@inBandMetadataTrackDispatchType tells that the 
content of the text attribute is TTML you don't have to give it to a VTT 
Parser. I think one interface (probably not named VTT to avoid 
confusion) would  suffice with a text attribute of type DOMString (or 
Document, Blob, ArrayBuffer, ... like in XHR).


> HTH,
> Silvia.
>> HTH,
>> Cyril
>> Le 31/08/2013 09:26, Silvia Pfeiffer a écrit :
>>> Hi all,
>>> Recent changes to the TextTrackCue interface had led to a fork with
>>> the WHATWG spec [1] when resolving bug 21851 [2].
>>> This caused extensive discussion on blink-dev [3] when an intent to
>>> implement was proposed.
>>> In the W3C WG we recognize the need for a generic cue interface type
>>> with a constructor and a text attribute. It allows browsers to expose
>>> cues in text tracks of video or audio files for which browsers don't
>>> intend to implement parsers. It also allows JavaScript developers to
>>> create time-synchronized data for media elements in any format they
>>> require.
>>> The discussion on blink-dev exposed that the currently specified
>>> solution of bug 21851 [2] in the HTML5 spec is flawed in several ways:
>>> (1) TextTrackCue objects that are not fully abstract create hard to
>>> debug issues of backwards compatibility due to existing code that
>>> assumes "new TextTrackCue()" constructs a cue with VTT semantics;
>>> (2) in order to transition old TextTrackCue interface usage to "new
>>> VTTCue()", it is better to remove the existing TextTrackCue
>>> constructor causing hard failure (easily recognizable) instead of soft
>>> failure (more difficult to recognize);
>>> (3) the abstract TextTrackCue interface of the WHATWG is desirable for
>>> extensibility to non-text-based cue interfaces of the future;
>>> (4) the interface fork between the WHATWG and W3C spec should be removed.
>>> An alternative resolution to bug 21851 [2] has previously been
>>> proposed and discussed: create a new interface that has the text
>>> attribute and the constructor and inherits from the abstract
>>> interface.
>>> This will result in the following interfaces:
>>> interface TextTrackCue : EventTarget {
>>>     readonly attribute TextTrack? track;
>>>              attribute DOMString id;
>>>              attribute double startTime;
>>>              attribute double endTime;
>>>              attribute boolean pauseOnExit;
>>>              attribute EventHandler onenter;
>>>              attribute EventHandler onexit;
>>> };
>>> [Constructor(double startTime, double endTime, DOMString text)]
>>> interface GenericCue : TextTrackCue {
>>>              attribute DOMString text;
>>> };
>>> Whether VTTCue will inherit from GenericCue or from TextTrackCue will
>>> be resolved in the TextTrack CG once this change has been applied to
>>> the HTML5 spec.
>>> It is my understanding that this proposed change resolves all the
>>> above listed issues. I will therefore apply these changes next week
>>> unless there are any further concerns.
>>> Regards,
>>> Silvia (as HTML spec editor).
>>> [1]
>>> [2]
>>> [3]
>> --
>> Cyril Concolato
>> Maître de Conférences/Associate Professor
>> Groupe Multimedia/Multimedia Group
>> Telecom ParisTech
>> 46 rue Barrault
>> 75 013 Paris, France

Cyril Concolato
Maître de Conférences/Associate Professor
Groupe Multimedia/Multimedia Group
Telecom ParisTech
46 rue Barrault
75 013 Paris, France

Received on Monday, 9 September 2013 08:23:09 UTC