[Draft] Transition request for VTT to Candidate rec.

Hi guys

in advance of this afternoon’s discussion, here is a draft of a Transition Request for VTT.  As you know, this has been hanging over is like a damoclean sword, and I’d like to move it along. We have a lot of implementation, a test suite in web platform tests, two rounds of wide review, and so on.  In general terms I think we are past cooked, but there may be some details that need addressing. Have a look and let’s get this moving this afternoon.  Thanks

(Based on a recent TTML advancement request).

* * * *

Dear Director, Philippe,

This is a Transition Request from the Timed Text WG for publication of a  Candidate Recommendation of the "WebVTT: The Web Video Text Tracks Format".

Transition details below.


1. Boilerplate ...

* Document title: WebVTT: The Web Video Text Tracks Format

* Document URI:

https://w3c.github.io/webvtt/ (currently formatted as a CG community report, but this is an editorial issue easily handled)


Latest published version:

https://w3c.github.io/webvtt/


* Estimated publication date: TBD

* Abstract:
This specification defines WebVTT, the Web Video Text Tracks format. Its main use is for marking up external text track resources in connection with the HTML <track> element. WebVTT files provide captions or subtitles for video content, and also text video descriptions [MAUR], chapters for content navigation, and more generally any form of metadata that is time-aligned with audio or video content.

* SotD:

Work on this specification is being undertaken both in the Web Media Text Tracks Community Group as well as in the W3C Timed Text Working Group. The latter group works towards a W3C Recommendation for reference purposes with interoperability requirements, while the former is a Draft Community Group Report that continues to evolve.

This document was published by the W3C Timed Text Working Group as a Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to public-tt@w3.org (subscribe, archives) with [webvtt] at the start of your email’s subject. All comments are welcome.

The Timed Text Working Group intends to recommend transition of this document to Candidate Recommendation and is offering this Working Draft for a public review period ending on 22 September 2017. A cumulative summary of all changes applied to this version since WebVTT First Public Working Draft was published is available at Changes from WebVTT (FPWD).

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 March 2017 W3C Process Document.

For this specification to exit the CR stage, at least 2 independent implementations of every feature defined in this specification need to be documented in the implementation
report or the WebPlatformTests results at https://wpt.fyi/webvtt. The WPT assumes a browser context and many implementations are not such, so input will be garnered from these other implementations manually; the implementation report may also be based on implementer-provided test results for the exit criteria test suite. The Working Group does not require that implementations are publicly available but encourages them to be so.

The Working Group has not identified features "at risk" for this specification; there are some features not widely implemented yet but the group considers them important and not droppable.

Substantive changes since FPWD

* The addition of pre-defined color class names for text and backgrounds
* the definition of conformance classes, notably the non-CSS ‘standalone’ conformance class


2. Record of the WG's Decision to request the CR Transition:


TBD


4. Evidence that the document satisfies Group's Requirements:

The media accessibility user requirements were defined for this specification in the Timed Text Working Group's charter at

https://www.w3.org/2016/05/timed-text-charter.html

" • Should address the Media Accessibility User Requirements.” https://www.w3.org/TR/media-accessibility-reqs/


[CC-1] Render text in a time-synchronized manner, using the media resource as the timebase master. 
- satisfied

[CC-2] Allow the author to specify erasures, i.e., times when no text is displayed on the screen (no text cues are active). 
- satisfied

[CC-3] Allow the author to assign timestamps so that one caption/subtitle follows another, with no perceivable gap in between. 
- satisfied

[CC-4] Be available in a text encoding. 
- satisfied

[CC-5] Support positioning in all parts of the screen - either inside the media viewport but also possibly in a determined space next to the media viewport. This is particularly important when multiple captions are on screen at the same time and relate to different speakers, or when in-picture text is avoided. 
- satisfied, but the captioning area has to be part of a media viewport. It’s not legal to paint outside ones viewport.

[CC-6] Support the display of multiple regions of text simultaneously. 
- satisfied

[CC-7] Display multiple rows of text when rendered as text in a right-to-left or left-to-right language. 
- satisfied

[CC-8] Allow the author to specify line breaks. 
- satisfied

[CC-9] Permit a range of font faces and sizes. 
- satisfied

[CC-10] Render a background in a range of colors, supporting a full range of opacity levels. 
- satisfied

[CC-11] Render text in a range of colors. The user should have final control over rendering styles like color and fonts; e.g., through user preferences. 
- satisfied

[CC-12] Enable rendering of text with a thicker outline or a drop shadow to allow for better contrast with the background. 
- satisfied (possibly only in CSS UAs?)

[CC-13] Where a background is used, it should be possible to keep the caption background visible even in times where no text is displayed, such that it minimizes distraction. However, where captions are infrequent the background should be allowed to disappear to enable the user to see as much of the underlying video as possible. 
- satisfied, under author control

[CC-14] Allow the use of mixed display styles— e.g., mixing paint-on captions with pop-on captions— within a single caption cue or in the caption stream as a whole. Pop-on captions are usually one or two lines of captions that appear on screen and remain visible for one to several seconds before they disappear. Paint-on captions are individual characters that are "painted on" from left to right, not popped onto the screen all at once, and usually are verbatim. Another often-used caption style in live captioning is roll-up - here, cue text follows double chevrons ("greater than" symbols), and is used to identify different speakers. Each sentence "rolls up" to about three lines. The top line of the three disappears as a new bottom line is added, allowing the continuous rolling up of new lines of captions. When displaying captions using the paint-on style, it is important to ensure that the final words that are displayed are visible for enough time for them to be read.
- paint-on is an artefact of old caption-creation and delivery systems. VTT and modern systems can emulate paint-on, but cues are delivered as a unit, not character-by-character

[CC-15] Support positioning such that the edge of the captions is a sufficient distance from the nearest screen edge to permit readability (e.g., at least 1/12 of the total screen height above the bottom of the screen, when rendered as text in a right-to-left or left-to-right language). 
- satisfied

[CC-16] Use conventions that include inserting left-to-right and right-to-left segments within a vertical run (e.g. Tate-chu-yoko in Japanese), when rendered as text in a top-to-bottom oriented language. 
- satisfied

[CC-17] Represent content of different natural languages. In some cases the inclusion of a few foreign words forms part of the original soundtrack, and thus needs to be so indicated in the caption. Also allow for separate caption files for different languages and on-the-fly switching between them. This is also a requirement for subtitles. See also [CC-20]
- satisfied

[CC-18] Represent content of at least those specific natural languages that may be represented with [Unicode 3.2], including common typographical conventions of that language (e.g., through the use of furigana and other forms of ruby text). 
- satisfied

[CC-19] Present the full range of typographical glyphs, layout and punctuation marks normally associated with the natural language's print-writing system. 
- satisfied to the extent Unicode does this.

[CC-20] Permit in-line mark-up for foreign words or phrases. 
- satisfied

[CC-21] Permit the distinction between different speakers. 
- satisfied

[CC-22] Support captions that are provided inside media resources as tracks, or in external files. 
- satisfied; webVTT can be delivered as a text file, or as a track in an MP4 file

[CC-23] Ascertain that captions are displayed in sync with the media resource. 
- satisfied

[CC-24] Support user activation/deactivation of caption tracks. 
- this is a feature of the system that displays 

[CC-25] Support both edited and verbatim captions when available. 
- this is a question of labelling caption streams in the encapsulating environment

[CC-26] Support multiple tracks of foreign-language subtitles including multiple subtitle tracks in the same foreign language.
- this is a feature of the environment

[CC-27] Support live-captioning functionality.
- satisfied; VTT files can be delivered line at a time, if needed, as there are no ‘bracketing’ constructs

[CC-28] Enable the bounding box of the background area to be extended by a preset distance relative to the foreground text contained with that background area.
- satisfied


[ECC-1] Support metadata markup for (sections of) timed text cues. 
- satisfied

[ECC-2] Support hyperlinks and other activation mechanisms for supplementary data for (sections of) caption text. 
- satisfied

[ECC-3]Support text cues that may be longer than the time available until the next text cue, thus providing overlapping text cues. In such instances, users should be enabled to decide whether they prefer to see overlapping text, or automatically shorten display time, or to have the media resource paused while the caption is displayed. Timing should be provided by the author, but the user should always be able to override the author's timings.
- satisfied, but there is no practical way for users to override timings in any caption system known

[ECC-4] Support timed text cues that are allowed to overlap with each other in time and be present on screen at the same time (e.g., those that come from the speech of different individuals). Also support timed text cues that are not allowed to overlap, so that playback will be paused in order to allow users to catch up with their reading.
- satisfied

[ECC-5] Allow users to define the reading speed and thus define how long each text cue requires, and whether media playback needs to pause sometimes to let them catch up on their reading. 
- this is a feature of the player rather than the caption expression language



5. Evidence Dependencies With Other Groups Met:

This specification has been extensively sent for review to external groups, most notably MPEG and FOMS, and they have not expressed any comment on dependencies.


6. Evidence that the document has received Wide Review:

Two extensive rounds of wide review were conducted, as documented in https://www.w3.org/wiki/TimedText/WebVTT_Wide_Review



7. Evidence that Issues Have Been Formally Addressed:

The tables in the wide review page, and the linked GitHub issues and BugZilla bugs, show the dispositions.

There are [[at least]] two issues raised where the commenter does not agree:

https://github.com/w3c/webvtt/issues/370 — the commenter would like timestamps not to insist on three digits after the decimal point
https://github.com/w3c/webvtt/issues/372 – the commenter wishes the default background to be 100% black, not 80%


8. Objections:

There are no Formal Objection from a TTWG Member or other parties,
during the preparation of this specification.


9. Features marked as "at risk":

The Working Group has not identified features "at risk" for this  specification.


10. Implementation Information:


Please see the Working Group's implementation report at <https://www.w3.org/wiki/TimedText/EffortsAndSpecifications#WebVTT-based_efforts_and_specifications>

Only some implementations are browser-hosted; some are polyfills and some are independent, standalone, implementations in other players.


The test suite is hosted at https://github.com/w3c/web-platform-tests/tree/master/webvtt

Reporting of feature coverage by the non-browser implementations will be done manually during the CR period.


CanIUse information is at <https://caniuse.com/#feat=webvtt>



11. Patent Disclosures: none

https://www.w3.org/2004/01/pp-impl/34314/status ("No patent disclosures have been made for any specifications of this group.”)


This document is governed by the 1 March 2017 W3C Process Document.


Regards,

David Singer, Chair of the Timed Text Working Group.
Thierry Michel, Team Contact for the Timed Text Working Group.

Received on Wednesday, 10 January 2018 16:43:17 UTC