- From: Glenn Adams <glenn@skynav.com>
- Date: Thu, 5 Sep 2013 10:43:24 -0600
- To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
- Cc: Cyril Concolato <cyril.concolato@telecom-paristech.fr>, public-html <public-html@w3.org>
- Message-ID: <CACQ=j+ddUXe0Y_suRtfUzFKti1bJpBZPS6KEz0v9tZ9_EmigkA@mail.gmail.com>
On Thu, Sep 5, 2013 at 8:19 AM, Silvia Pfeiffer <silviapfeiffer1@gmail.com>wrote: > On Thu, Sep 5, 2013 at 1:03 AM, Cyril Concolato > <cyril.concolato@telecom-paristech.fr> wrote: > > Hi Silvia, > > > > It is a bit hard to follow this long discussion spread on this list, the > > blink-dev list, the bug tracker, ... I'll give my understanding in the > hope > > that it helps and that it won't add more confusion. > > Thanks. It's nice to see the requirements summarised by somebody else, too. > > > > My understanding is that we should distinguish the process which > generates > > cues from the process that consumes the cues and draft the interface(s) > with > > both processes in mind. > > > > There are 2 ways to generate cue objects: > > > > A. created by some JS code > > The content of the cue may be generated client-side or received from XHR. > > The format of the cue content may be anything: plain text, xml, binary > data, > > base64 encoded or not. The data has at least a start time (possibly an > end > > time) and should have an associated MIME type. Then you have 2 sub-cases: > > > > A.1 The browser is capable of creating specific objects from the cue > > content following the MIME type (e.g. WebVTT Node objects, TTML objects, > > ...). In that case, there should be a way (for instance a dedicated > > interface) for a JS app to have the cue content parsed and have the > objects > > created by the browser: i.e. if the content type of the cue I want to > > generate is text/CueFormatX, I will check if the browser supports the > > parsing of the CueFormatX, and call the parsing (via a constructor or > > another method) to get a specialized object and then access > > CueFormatX.propertyY if needed. > > VTTCue satisfies this. > > > A.2 The browser is not capable of creating specific objects from the > cue > > content (e.g. proprietary binary data) or the MIME type is unknown, the > JS > > can use a generic constructor or method to store the timed cue content > for > > later use. > > VTTCue with @kind=metadata would satisfy this, but also the new > GenericCue interface for any @kind > > > > B. created by the browser > > The content of the cues is generated and received, outside of a JS > > processing, from resources in a format that is understood by the browser > > (e.g. plain WebVTT files, TTML files, MP4 files, MPEG-2 TS, WebM, ...). > Same > > as above, the browser will generate cue objects, ideally as much > specialized > > as possible: i.e. if the resources is of type text/vtt, it should create > > VTTCue; or similar for text/CueFormatX. > > > > Then, there are 2 ways to consume the cue objects: > > Recent discussion has exposed a third way to consume the cue objects: > > E. The browser is able to convert the cue content to a format for > which it is able to produce a renderable representation. It basically > pretends to the JS developer that the parsed data is a WebVTT cue. > > > > C. The browser is capable of producing a renderable representation of the > > cue content (e.g. ideally there is a method (or equivalent) > > isRenderableTextTrack(mime) which returns true), then: > > C.1 If the rendering is left to the browser natively, the track kind is > > set to subtitles or captions. > > VTTCue provides for this. No other rendering algorithm for TextTrack > cues has been specified. > FYI, TTML2 will specify a TTMLCue object and a rendering algorithm. It is not expected to make use of VTTCue. > > > C.2 If the rendering needs to be altered by the JS, the track kind is > set > > to metadata, the JS code calls getCueAsHTML when needed, the result is > > modified and displayed. > > JS is able to get a HTML representation of VTTCue text content, but > why would there need to be a change of @kind ? > > > > D. The browser is not capable of producing a renderable representation of > > the cue content > > The JS code should handle the rendering of the cue content from the > given > > cue objects (specialized or not) > > It's this use case D which is at the core of our discussion (assuming > you include parsing as part of rendering). The W3C spec proposal for > the GenericCue interface provides for cue content to be exposed by the > browser and rendered by JS, satisfying your use case D. However, there > is a position that if browsers are not capable of parsing and > rendering cue content, they should not expose it to JS at all - in > particular for captions and subtitles. I think this position is better described as "parsing and rendering renderable cue content", namely cues associated with UA renderable track @kind, i.e., specifically, @kind != "metadata". I do not get the sense there is opposition to exposing @kind="metadata" cue content to script. > If they won't, then we can > simply pretend everything is a WebVTT cue and when not rendered, it's > of @kind=metadata (even if it's actually caption content). > We should treat this as an implementation strategy (on part of particular UAs), and not something we codify in the spec, though it wouldn't hurt to mention it as a possible implementation strategy along with text that warns that this may result in dropping semantics of the source format. We definitely should not presume this is a strategy that will be universally followed. > > > > Of course, you could mix how the cues are received with how they are > > rendered and have: > > - B+C (e.g. the browser supports parsing of WebVTT into cue nodes and the > > rendering) > > - or B+D (receiving an unknown track from an MP4 file (e.g. 3GPP Timed > Text) > > and have JS conversion to WebVTT cues), > > - or A.1+C > > - or A.1+D > > - or A.2+D > > I don't see use cases for A.2+C: if a browser is not capable of creating > > specialized objects for a format it is probably not capable of rendering > the > > cue. > > > > I don't have a clear opinion on which design is the best (new cue > interfaces > > with/without constructor, methods on the texttrack interface, ...), but I > > would like to have all use cases possible. Is it the case with the W3C > > approach? > > Yes. > > > with the WhatWG approach? > > Case D is not supported in the WHATWG approach. > > > Could we compared example codes? > > I can give you an example: if you have TTML in-band in MP4, it's > caption content, a browser has no parser and renderer for it, but can > in theory extract the cues from the MP4 encapsulation - > > - the WHATWG spec would either not expose them to JS at all, or expect > them to be exposed as VTTCue objects with @kind=metadata > This would not work, since VTTCue interprets cues of kind metadata as *WebVTT metadata text* [1], which is most definitely incompatible with TTML that has been serialized into intermediate synchronic document instances, each of which is effectively an XML document. [1] https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml2/spec/ttml2.html#extension-designations > > - the W3C spec as proposed on this thread would expose them to JS as > GenericCue objects with @kind=captions > > > HTH, > Silvia. > > > > HTH, > > Cyril > > > > > > Le 31/08/2013 09:26, Silvia Pfeiffer a écrit : > > > >> Hi all, > >> > >> Recent changes to the TextTrackCue interface had led to a fork with > >> the WHATWG spec [1] when resolving bug 21851 [2]. > >> > >> This caused extensive discussion on blink-dev [3] when an intent to > >> implement was proposed. > >> > >> In the W3C WG we recognize the need for a generic cue interface type > >> with a constructor and a text attribute. It allows browsers to expose > >> cues in text tracks of video or audio files for which browsers don't > >> intend to implement parsers. It also allows JavaScript developers to > >> create time-synchronized data for media elements in any format they > >> require. > >> > >> The discussion on blink-dev exposed that the currently specified > >> solution of bug 21851 [2] in the HTML5 spec is flawed in several ways: > >> > >> (1) TextTrackCue objects that are not fully abstract create hard to > >> debug issues of backwards compatibility due to existing code that > >> assumes "new TextTrackCue()" constructs a cue with VTT semantics; > >> (2) in order to transition old TextTrackCue interface usage to "new > >> VTTCue()", it is better to remove the existing TextTrackCue > >> constructor causing hard failure (easily recognizable) instead of soft > >> failure (more difficult to recognize); > >> (3) the abstract TextTrackCue interface of the WHATWG is desirable for > >> extensibility to non-text-based cue interfaces of the future; > >> (4) the interface fork between the WHATWG and W3C spec should be > removed. > >> > >> An alternative resolution to bug 21851 [2] has previously been > >> proposed and discussed: create a new interface that has the text > >> attribute and the constructor and inherits from the abstract > >> interface. > >> > >> This will result in the following interfaces: > >> > >> interface TextTrackCue : EventTarget { > >> readonly attribute TextTrack? track; > >> > >> attribute DOMString id; > >> attribute double startTime; > >> attribute double endTime; > >> attribute boolean pauseOnExit; > >> > >> attribute EventHandler onenter; > >> attribute EventHandler onexit; > >> }; > >> > >> [Constructor(double startTime, double endTime, DOMString text)] > >> interface GenericCue : TextTrackCue { > >> attribute DOMString text; > >> }; > >> > >> Whether VTTCue will inherit from GenericCue or from TextTrackCue will > >> be resolved in the TextTrack CG once this change has been applied to > >> the HTML5 spec. > >> > >> It is my understanding that this proposed change resolves all the > >> above listed issues. I will therefore apply these changes next week > >> unless there are any further concerns. > >> > >> Regards, > >> Silvia (as HTML spec editor). > >> > >> [1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=22903 > >> [2] https://www.w3.org/Bugs/Public/show_bug.cgi?id=21851 > >> [3] > >> > https://groups.google.com/a/chromium.org/d/msg/blink-dev/-VHGnuNNUxM/Yibbv2TgDoYJ > >> > > > > > > -- > > Cyril Concolato > > Maître de Conférences/Associate Professor > > Groupe Multimedia/Multimedia Group > > Telecom ParisTech > > 46 rue Barrault > > 75 013 Paris, France > > http://concolato.wp.mines-telecom.fr/ > > > > > >
Received on Thursday, 5 September 2013 16:44:13 UTC