Re: [Draft 4] Transition request for VTT to Candidate rec. from David Singer on 2018-01-16 (public-tt@w3.org from January 2018)

From: David Singer <singer@apple.com>
Date: Tue, 16 Jan 2018 10:30:46 -0800
To: Nigel Megitt <nigel.megitt@bbc.co.uk>
Cc: TTWG <public-tt@w3.org>, Philippe Le Hegaret <plh@w3.org>, Silvia Pfieffer <silviapfeiffer1@gmail.com>
Message-id: <E6897E97-553A-4248-912E-EDDA7C127495@apple.com>
> On Jan 12, 2018, at 11:04 , Nigel Megitt <nigel.megitt@bbc.co.uk> wrote:
> 
> In the section that describes the CG/WG working model, it would probably
> be helpful for the transition to explain the current delta between the CG
> and the WG versions of the document (which may be none).

I am pretty certain it’s none at the moment. Once the WG version freezes, we will have to handle this (but I think we go back to revised CG report in the CG and WDs in the WG.)

> Feels to me that
> this would be good practice in this CG/WG model, and we're forging the way
> there. Happy to hear other views.
> 
> The SOTD section still mentions WD - there's standard boilerplate for CR
> SOTD that just needs to be used - it can be left to the Editor I think to
> do that.

Yes, I hope so.

> 
> The minutes don't include a resolution about the exit criteria so we will
> need to make sure that is covered when we do resolve to transition to CR.
> What we did say though was that we'd put the WPT results in a single
> Implementation Report for simplicity of review.

OK

> 
> On the MAUR requirements:
> 
> 
> [CC-6] Is it worth pointing out that WebVTT does not support overlapping
> regions? (not that such a requirement is explicitly called out in the
> MAUR, or even exists, just that it is a constraint if I understand
> correctly)

I think it makes most sense to respond to the requirements as written.

> 
> [CC-13] not sure how caption background colours can be kept visible when
> there's no caption text visible? This may be a lack of understanding on my
> part, but I'd be interested to know.

I’ll let Silvia suggest best practices here.

> 
> [CC-14] (paint on) and [CC-27] cannot both be delivered together if I've
> read the responses correctly - in other words there is no model for
> appending words to cues in a live environment.

The only way to simulate paint on is to send increasing text strings, I think.

> 
> Kind regards,
> 
> Nigel
> 
> 
> 
> On 12/01/2018, 16:57, "singer@apple.com on behalf of David Singer"
> <singer@apple.com> wrote:
> 
>> Hi guys
>> 
>> Updated based on reviewing the minutes.
>> 
>> To proceed further:
>> "to proceed with the CR transition request (a) response from the
>> commenter, or feb 15th, whichever is sooner, (b) the revised transition
>> request and (c) a document prepared as a rec-track document (not CG
>> report)² <https://www.w3.org/2018/01/10-tt-minutes.html#item50>
>> 
>> * we need to be sure we¹ve resolved WR comments and reached Feb 15th or
>> consensus with the commenter;
>> * we need a Rec track document prepared with an updated SOTD
>> * and we need the final details in this transition request
>> * and the formal resolution of the WG
>> 
>> 
>> (Based on a recent TTML advancement request).
>> 
>> * * * *
>> 
>> Dear Director, Philippe,
>> 
>> This is a Transition Request from the Timed Text WG for publication of a
>> Candidate Recommendation of the "WebVTT: The Web Video Text Tracks
>> Format".
>> 
>> Transition details below.
>> 
>> 
>> 1. Boilerplate ...
>> 
>> * Document title: WebVTT: The Web Video Text Tracks Format
>> 
>> * Document URI:
>> 
>> https://w3c.github.io/webvtt/ (currently formatted as a CG community
>> report, but this is an editorial issue easily handled)
>> 
>> 
>> Latest published version:
>> 
>> https://w3c.github.io/webvtt/
>> 
>> 
>> * Estimated publication date: TBD
>> 
>> * Abstract:
>> This specification defines WebVTT, the Web Video Text Tracks format. Its
>> main use is for marking up external text track resources in connection
>> with the HTML <track> element. WebVTT files provide captions or subtitles
>> for video content, and also text video descriptions [MAUR], chapters for
>> content navigation, and more generally any form of metadata that is
>> time-aligned with audio or video content.
>> 
>> * SotD:
>> 
>> Work on this specification is being undertaken both in the Web Media Text
>> Tracks Community Group as well as in the W3C Timed Text Working Group.
>> The latter group works towards a W3C Recommendation for reference
>> purposes with interoperability requirements, while the former is a Draft
>> Community Group Report that continues to evolve.
>> 
>> This document was published by the W3C Timed Text Working Group as a
>> Working Draft. This document is intended to become a W3C Recommendation.
>> If you wish to make comments regarding this document, please send them to
>> public-tt@w3.org (subscribe, archives) with [webvtt] at the start of your
>> email¹s subject. All comments are welcome.
>> 
>> Publication as a Working Draft does not imply endorsement by the W3C
>> Membership. This is a draft document and may be updated, replaced or
>> obsoleted by other documents at any time. It is inappropriate to cite
>> this document as other than work in progress.
>> 
>> This document was produced by a group operating under the 5 February 2004
>> W3C Patent Policy. W3C maintains a public list of any patent disclosures
>> made in connection with the deliverables of the group; that page also
>> includes instructions for disclosing a patent. An individual who has
>> actual knowledge of a patent which the individual believes contains
>> Essential Claim(s) must disclose the information in accordance with
>> section 6 of the W3C Patent Policy.
>> 
>> This document is governed by the 1 March 2017 W3C Process Document.
>> 
>> For this specification to exit the CR stage, at least 2 independent
>> implementations of every feature defined in this specification need to be
>> documented in the implementation report which will include the
>> WebPlatformTests results at https://wpt.fyi/webvtt. The WPT assumes a
>> browser context and many implementations are not such, so input will be
>> garnered from these other implementations manually; the implementation
>> report may also be based on implementer-provided test results for the
>> exit criteria test suite. The Working Group does not require that
>> implementations are publicly available but encourages them to be so.
>> 
>> The Working Group has not identified features "at risk" for this
>> specification; there are some features not widely implemented yet but the
>> group considers them important and not droppable.
>> 
>> Substantive changes since FPWD
>> 
>> see 
>> <https://www.w3.org/wiki/TimedText/WebVTT_Wide_Review#Substantive_changes_
>> since_.28Second.29_Wide_Review_Review>
>> 
>> 2. Record of the WG's Decision to request the CR Transition:
>> 
>> 
>> TBD
>> 
>> 
>> 3. Evidence that the document satisfies Group's Requirements:
>> 
>> The media accessibility user requirements were defined for this
>> specification in the Timed Text Working Group's charter at
>> 
>> https://www.w3.org/2016/05/timed-text-charter.html
>> 
>> " € Should address the Media Accessibility User Requirements.²
>> https://www.w3.org/TR/media-accessibility-reqs/
>> 
>> 
>> [CC-1] Render text in a time-synchronized manner, using the media
>> resource as the timebase master.
>> - satisfied
>> 
>> [CC-2] Allow the author to specify erasures, i.e., times when no text is
>> displayed on the screen (no text cues are active).
>> - satisfied
>> 
>> [CC-3] Allow the author to assign timestamps so that one caption/subtitle
>> follows another, with no perceivable gap in between.
>> - satisfied
>> 
>> [CC-4] Be available in a text encoding.
>> - satisfied
>> 
>> [CC-5] Support positioning in all parts of the screen - either inside the
>> media viewport but also possibly in a determined space next to the media
>> viewport. This is particularly important when multiple captions are on
>> screen at the same time and relate to different speakers, or when
>> in-picture text is avoided.
>> - satisfied, but the captioning area has to be part of a media viewport.
>> It¹s not legal to paint outside ones viewport.
>> 
>> [CC-6] Support the display of multiple regions of text simultaneously.
>> - satisfied
>> 
>> [CC-7] Display multiple rows of text when rendered as text in a
>> right-to-left or left-to-right language.
>> - satisfied
>> 
>> [CC-8] Allow the author to specify line breaks.
>> - satisfied
>> 
>> [CC-9] Permit a range of font faces and sizes.
>> - satisfied
>> 
>> [CC-10] Render a background in a range of colors, supporting a full range
>> of opacity levels.
>> - satisfied
>> 
>> [CC-11] Render text in a range of colors. The user should have final
>> control over rendering styles like color and fonts; e.g., through user
>> preferences. 
>> - satisfied
>> 
>> [CC-12] Enable rendering of text with a thicker outline or a drop shadow
>> to allow for better contrast with the background.
>> - satisfied (possibly only in CSS UAs?)
>> 
>> [CC-13] Where a background is used, it should be possible to keep the
>> caption background visible even in times where no text is displayed, such
>> that it minimizes distraction. However, where captions are infrequent the
>> background should be allowed to disappear to enable the user to see as
>> much of the underlying video as possible.
>> - satisfied, under author control
>> 
>> [CC-14] Allow the use of mixed display styles‹ e.g., mixing paint-on
>> captions with pop-on captions‹ within a single caption cue or in the
>> caption stream as a whole. Pop-on captions are usually one or two lines
>> of captions that appear on screen and remain visible for one to several
>> seconds before they disappear. Paint-on captions are individual
>> characters that are "painted on" from left to right, not popped onto the
>> screen all at once, and usually are verbatim. Another often-used caption
>> style in live captioning is roll-up - here, cue text follows double
>> chevrons ("greater than" symbols), and is used to identify different
>> speakers. Each sentence "rolls up" to about three lines. The top line of
>> the three disappears as a new bottom line is added, allowing the
>> continuous rolling up of new lines of captions. When displaying captions
>> using the paint-on style, it is important to ensure that the final words
>> that are displayed are visible for enough time for them to be read.
>> - paint-on is an artefact of old caption-creation and delivery systems.
>> VTT and modern systems can emulate paint-on, but cues are delivered as a
>> unit, not character-by-character
>> 
>> [CC-15] Support positioning such that the edge of the captions is a
>> sufficient distance from the nearest screen edge to permit readability
>> (e.g., at least 1/12 of the total screen height above the bottom of the
>> screen, when rendered as text in a right-to-left or left-to-right
>> language). 
>> - satisfied
>> 
>> [CC-16] Use conventions that include inserting left-to-right and
>> right-to-left segments within a vertical run (e.g. Tate-chu-yoko in
>> Japanese), when rendered as text in a top-to-bottom oriented language.
>> - satisfied
>> 
>> [CC-17] Represent content of different natural languages. In some cases
>> the inclusion of a few foreign words forms part of the original
>> soundtrack, and thus needs to be so indicated in the caption. Also allow
>> for separate caption files for different languages and on-the-fly
>> switching between them. This is also a requirement for subtitles. See
>> also [CC-20]
>> - satisfied
>> 
>> [CC-18] Represent content of at least those specific natural languages
>> that may be represented with [Unicode 3.2], including common
>> typographical conventions of that language (e.g., through the use of
>> furigana and other forms of ruby text).
>> - satisfied
>> 
>> [CC-19] Present the full range of typographical glyphs, layout and
>> punctuation marks normally associated with the natural language's
>> print-writing system.
>> - satisfied to the extent Unicode does this.
>> 
>> [CC-20] Permit in-line mark-up for foreign words or phrases.
>> - satisfied
>> 
>> [CC-21] Permit the distinction between different speakers.
>> - satisfied
>> 
>> [CC-22] Support captions that are provided inside media resources as
>> tracks, or in external files.
>> - satisfied; webVTT can be delivered as a text file, or as a track in an
>> MP4 file
>> 
>> [CC-23] Ascertain that captions are displayed in sync with the media
>> resource. 
>> - satisfied
>> 
>> [CC-24] Support user activation/deactivation of caption tracks.
>> - this is a feature of the system that displays
>> 
>> [CC-25] Support both edited and verbatim captions when available.
>> - this is a question of labelling caption streams in the encapsulating
>> environment
>> 
>> [CC-26] Support multiple tracks of foreign-language subtitles including
>> multiple subtitle tracks in the same foreign language.
>> - this is a feature of the environment
>> 
>> [CC-27] Support live-captioning functionality.
>> - satisfied; VTT files can be delivered line at a time, if needed, as
>> there are no Œbracketing¹ constructs
>> 
>> [CC-28] Enable the bounding box of the background area to be extended by
>> a preset distance relative to the foreground text contained with that
>> background area.
>> - satisfied
>> 
>> 
>> [ECC-1] Support metadata markup for (sections of) timed text cues.
>> - satisfied
>> 
>> [ECC-2] Support hyperlinks and other activation mechanisms for
>> supplementary data for (sections of) caption text.
>> - satisfied
>> 
>> [ECC-3]Support text cues that may be longer than the time available until
>> the next text cue, thus providing overlapping text cues. In such
>> instances, users should be enabled to decide whether they prefer to see
>> overlapping text, or automatically shorten display time, or to have the
>> media resource paused while the caption is displayed. Timing should be
>> provided by the author, but the user should always be able to override
>> the author's timings.
>> - satisfied, but there is no practical way for users to override timings
>> in any caption system known
>> 
>> [ECC-4] Support timed text cues that are allowed to overlap with each
>> other in time and be present on screen at the same time (e.g., those that
>> come from the speech of different individuals). Also support timed text
>> cues that are not allowed to overlap, so that playback will be paused in
>> order to allow users to catch up with their reading.
>> - satisfied
>> 
>> [ECC-5] Allow users to define the reading speed and thus define how long
>> each text cue requires, and whether media playback needs to pause
>> sometimes to let them catch up on their reading.
>> - this is a feature of the player rather than the caption expression
>> language
>> 
>> 
>> 
>> 4. Evidence Dependencies With Other Groups Met:
>> 
>> This specification has been extensively sent for review to external
>> groups, most notably MPEG and FOMS, and they have not expressed any
>> comment on dependencies. There are no changes in dependencies. The FOMS
>> group are 3GPP SA4 are not listed in the charter, but have been kept
>> informed.
>> 
>> 
>> 5. Evidence that the document has received Wide Review:
>> 
>> Two extensive rounds of wide review were conducted, as documented in
>> https://www.w3.org/wiki/TimedText/WebVTT_Wide_Review
>> 
>> The implementations page (9, below) also gives evidence of review (by the
>> implementers at least). The FOMS community
>> <http://www.foms-workshop.org/foms2017/> also has discussed and reviewed
>> VTT though they are not a formal organization or in liaison.
>> 
>> 6. Evidence that Issues Have Been Formally Addressed:
>> 
>> The tables in the wide review page, and the linked GitHub issues and
>> BugZilla bugs, show the dispositions. As this is an active specification,
>> questions and issues continue to be filed, but we believe all wide review
>> and important issues have been considered.
>> 
>> 
>> 7. Objections:
>> 
>> There have been no Formal Objection from a TTWG Member or other parties,
>> during the preparation of this specification.
>> 
>> There are [[at least]] two issues raised where the commenter does not
>> agree:
>> 
>> https://github.com/w3c/webvtt/issues/370 ‹ the commenter would like
>> timestamps not to insist on three digits after the decimal point
>> https://github.com/w3c/webvtt/issues/372  the commenter wishes the
>> default background to be 100% black, not 80%
>> 
>> 
>> 8. Features marked as "at risk":
>> 
>> The Working Group has not identified features "at risk" for this
>> specification.
>> 
>> 
>> 9. Implementation Information:
>> 
>> There is existing information on implementation (here) which will be
>> updated in the CR period.
>> 
>> Please see the Working Group's implementation report at
>> <https://www.w3.org/wiki/TimedText/EffortsAndSpecifications#WebVTT-based_e
>> fforts_and_specifications>
>> 
>> which includes the references to the web platform tests
>> https://github.com/w3c/web-platform-tests/tree/master/webvtt and the
>> results, and the canIUse information at <https://caniuse.com/#feat=webvtt>
>> 
>> Only some implementations are browser-hosted; some are polyfills and some
>> are independent, standalone, implementations in other players. Reporting
>> of feature coverage by the non-browser implementations will be done
>> manually during the CR period.
>> 
>> 
>> 10. Patent Disclosures: none
>> 
>> https://www.w3.org/2004/01/pp-impl/34314/status ("No patent disclosures
>> have been made for any specifications of this group.²)
>> 
>> 
>> This document is governed by the 1 March 2017 W3C Process Document.
>> 
>> 
>> Regards,
>> 
>> David Singer, Chair of the Timed Text Working Group.
>> Thierry Michel, Team Contact for the Timed Text Working Group.
>> 
>> 
> 

David Singer
Manager, Software Standards, Apple Inc.
Received on Tuesday, 16 January 2018 18:31:11 UTC