Re: Resolving TextTrackCue issues from Eric Carlson on 2013-09-03 (public-html@w3.org from September 2013)

From: Eric Carlson <eric.carlson@apple.com>
Date: Tue, 03 Sep 2013 09:10:11 -0700
To: Philip Jägenstedt <philipj@opera.com>
Cc: public-html <public-html@w3.org>, Jer Noble <jer.noble@apple.com>
Message-id: <DA62300D-E62C-44DB-8DF0-8A1658809DA3@apple.com>
On Sep 2, 2013, at 1:20 AM, Philip Jägenstedt <philipj@opera.com> wrote:

> On Sat, Aug 31, 2013 at 9:26 AM, Silvia Pfeiffer
> <silviapfeiffer1@gmail.com> wrote:
>> Hi all,
>> 
>> Recent changes to the TextTrackCue interface had led to a fork with
>> the WHATWG spec [1] when resolving bug 21851 [2].
>> 
>> This caused extensive discussion on blink-dev [3] when an intent to
>> implement was proposed.
>> 
>> In the W3C WG we recognize the need for a generic cue interface type
>> with a constructor and a text attribute. It allows browsers to expose
>> cues in text tracks of video or audio files for which browsers don't
>> intend to implement parsers. It also allows JavaScript developers to
>> create time-synchronized data for media elements in any format they
>> require.
>> 
>> The discussion on blink-dev exposed that the currently specified
>> solution of bug 21851 [2] in the HTML5 spec is flawed in several ways:
>> 
>> (1) TextTrackCue objects that are not fully abstract create hard to
>> debug issues of backwards compatibility due to existing code that
>> assumes "new TextTrackCue()" constructs a cue with VTT semantics;
>> (2) in order to transition old TextTrackCue interface usage to "new
>> VTTCue()", it is better to remove the existing TextTrackCue
>> constructor causing hard failure (easily recognizable) instead of soft
>> failure (more difficult to recognize);
>> (3) the abstract TextTrackCue interface of the WHATWG is desirable for
>> extensibility to non-text-based cue interfaces of the future;
>> (4) the interface fork between the WHATWG and W3C spec should be removed.
>> 
>> An alternative resolution to bug 21851 [2] has previously been
>> proposed and discussed: create a new interface that has the text
>> attribute and the constructor and inherits from the abstract
>> interface.
>> 
>> This will result in the following interfaces:
>> 
>> interface TextTrackCue : EventTarget {
>>  readonly attribute TextTrack? track;
>> 
>>           attribute DOMString id;
>>           attribute double startTime;
>>           attribute double endTime;
>>           attribute boolean pauseOnExit;
>> 
>>           attribute EventHandler onenter;
>>           attribute EventHandler onexit;
>> };
>> 
>> [Constructor(double startTime, double endTime, DOMString text)]
>> interface GenericCue : TextTrackCue {
>>           attribute DOMString text;
>> };
>> 
>> Whether VTTCue will inherit from GenericCue or from TextTrackCue will
>> be resolved in the TextTrack CG once this change has been applied to
>> the HTML5 spec.
>> 
>> It is my understanding that this proposed change resolves all the
>> above listed issues. I will therefore apply these changes next week
>> unless there are any further concerns.
>> 
>> Regards,
>> Silvia (as HTML spec editor).
>> 
>> [1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=22903
>> [2] https://www.w3.org/Bugs/Public/show_bug.cgi?id=21851
>> [3] https://groups.google.com/a/chromium.org/d/msg/blink-dev/-VHGnuNNUxM/Yibbv2TgDoYJ
>> 
> 
> Am I correct to assume that GenericCue will never be rendered? If so,
> a format initially exposed using GenericCue can likely never be
> "upgraded" to a fully functional interface, since scripts will have
> come to assume that those formats aren't rendered by the browser. That
> seems pretty bad to me, and a good reason for browsers to not stop
> half way in supporting a format.
> 
> In the blink-dev thread Silvia made reference to TextTrackCueGeneric,
> [1] so it would be nice to hear from Apple (CC:ed) about whether or
> not GenericCue would be a suitable replacement.

  Not if the assumption is that GenericCue will never be rendered by the browser (see below).

> AFAICT they don't seem
> very similar, as TextTrackCueGeneric has rendering information
> attached (e.g. setFontSize) and presumably can be rendered.
> 
  I didn't know about the blink-dev thread until now, but I just read through it. I am not sure where the mis-information came from, but WebKit's TextTrackCueGeneric represents a cue that most definitely *is* rendered by WebKit. 

  Apple's AVFoundation media framework supports in-band cues in several formats, so WebKit uses an interface that delivers all cues in a common ("generic") format with position, alignment, some style, etc. All but the style is converted to the WebVTT equivalent.

  TextTrackCueGeneric derives from TextTrackCue so from JavaScript, a "generic" cue is the same as a WebVTT cue. It is similar to how WebKit's GTK port has GStreamer dynamically transcode Kate tracks in Ogg and srt tracks in MKV into WebVTT [1]. I assume this is what IE does internally to support cues from TTML files. 

eric


[1] http://trac.webkit.org/browser/trunk/Source/WebCore/platform/graphics/gstreamer/TextCombinerGStreamer.cpp
Received on Tuesday, 3 September 2013 16:10:31 UTC