Re: A new proposal for how to deal with text track cues from David Ronca on 2013-06-16 (public-tt@w3.org from June 2013)

From: David Ronca <dronca@netflix.com>
Date: Sat, 15 Jun 2013 17:03:32 -0700
To: public-tt@w3.org
Message-ID: <CAMjV-Fj218ep7DEiLB91k7NOBuBaFZd4+pQO6qtydYLYxB531Q@mail.gmail.com>
One last important point that I wanted to clarify. At this time, we
believe that SMPTE-TT is the best option for a caption source format,
and we will likely settle on some variant of SMPTE-TT as the required
caption format for ingest.  We do not feel that SMPTE-TT (as currently
defined) is suitable for devices, and our client spec is not a
derivative of SMPTE-TT.

If we had a compelling business case to deliver WebVTT to a client,
then we would add the necessary support to our streaming service.
Since our preferred ingest format will be SMPTE-TT, we definitely need
a good model for converting SMPTE-TT to WebVTT.  If this problem is
solved through the WG, all the better for us.

D

On Sat, Jun 15, 2013 at 2:59 PM, David Ronca <dronca@netflix.com> wrote:
>> Indeed - there was some strong lobbying going on at the FCC to make this
>> happen so this is your problem now.
>
> I fail to see a problem with FCC providing a WebVTT safe harbor.
> Perhaps it might give WebVTT a boost.  And if WebVTT gains substantial
> traction, we would certainly adjust.  I just don't see it at the
> moment.
>
>> WebVTT is created for everyone on the Web.
>
> Sure.  As long as the statement "as they publish to TV & Film - not
> the Web - they don't need WebVTT" is not assumed to be inversely true
> (if they publish to the web they need WebVTT).
>
>
>> If however your statement implies that this group should ignore WebVTT
>> because it's not relevant to this group, then that's a fair statement and
>> I'll go away and stop bothering this group.
>
> I'd not suggest any such thing.  Hard to see the value of segregating
> the WG's out by format.  Indeed with two major companies pushing
> WebVTT, it seems that co-existence is an absolute must.  At the least,
> we will all benefit from tools that can do good WebVTT->TTML and
> TTML->WebVTT format conversion.
>
> D
>
>
>
> On Sat, Jun 15, 2013 at 1:55 PM, Silvia Pfeiffer
> <silviapfeiffer1@gmail.com> wrote:
>>
>> On 16 Jun 2013 06:29, "David Ronca" <dronca@netflix.com> wrote:
>>>
>>> > They buy the formats that they need and as long as they publish to TV
>>> > & Film - not the Web - they don't need WebVTT.
>>>
>>> Our client spec mandates TTML support (a Netflix profile), and is
>>> unlikely to change.  For ingest, our preferred format is SMPTE-TT
>>> (profile restrictions in development).  I believe that the momentum is
>>> behind TTML (due in part to the FCC safe-harbor clause for SMPTE-TT).
>>
>> Indeed - there was some strong lobbying going on at the FCC to make this
>> happen so this is your problem now.
>>
>>> I guess tool vendors will support WebVTT, but the large use case will
>>> be converting TTML to WebVTT in order to deliver to one of the few
>>> companies that will require WebVTT.  I don't expect much authoring to
>>> be done in WebVTT.  Just my two cents.
>>
>> WebVTT is created for everyone on the Web. If you don't use it , that's not
>> WebVTT's problem.
>>
>> If however your statement implies that this group should ignore WebVTT
>> because it's not relevant to this group, then that's a fair statement and
>> I'll go away and stop bothering this group.
>>
>> Regard,
>> Silvia.
>>
>>>
>>> David
>>>
>>> >
>>> > On Sat, Jun 15, 2013 at 3:17 AM, Silvia Pfeiffer
>>> > <silviapfeiffer1@gmail.com> wrote:
>>> >> On Fri, Jun 14, 2013 at 9:37 PM, John Birch
>>> >> <John.Birch@screensystems.tv> wrote:
>>> >>> Hi Silvia,
>>> >>>
>>> >>> Thanks for your email... I've commented in-line below. (>>)
>>> >>>
>>> >>> As I state below, please do not misunderstand, I am not against the
>>> >>> implementation of another subtitle / caption *output* format. I am concerned
>>> >>> however about an output format that seeks to 'gloss over' the potential
>>> >>> inadequacies of the caption / subtitle authoring. Captions and subtitles
>>> >>> should never be considered 'second rate' ancillary content that can be fixed
>>> >>> up by a 'clever' browser. For accessibility there is a clear ethical desire
>>> >>> to have the best authored content. For translation, (where as much as 90% of
>>> >>> the audience may need a quality translation experience) the commercial
>>> >>> driver for high quality subtitles is even more important. Garbage in ,
>>> >>> garbage out. My primary concern with WebVTT is that far too much attention
>>> >>> is being paid to supporting a 'garbage in' mentality.
>>> >>
>>> >>
>>> >> I think you're still misunderstanding what WebVTT does. If a file is
>>> >> of high quality and captions/subtitles are authored as to a high
>>> >> standard (as I would expect from commercial entities), and the video
>>> >> is being displayed at a sufficient size to display the authored
>>> >> content as intended, the rendering algorithm will not do any, as you
>>> >> call it, 'fix up'.
>>> >>
>>> >> However, the browser has to do something when a line of text has to be
>>> >> wrapped because the video's width is too small to render the text.
>>> >> Also, the current spec will - in the unlikely event that several cues
>>> >> are rendered at the same time and have been poorly authored to overlap
>>> >> each other - try to move the cues slighly to make the text not
>>> >> overlap. These are the only two situations in which the WebVTT
>>> >> rendering algorithm will make any changes to the positioning of the
>>> >> text.
>>> >>
>>> >> Also, you might want to talk with the people at YouTube that have to
>>> >> deal with a lot of garbage captions that they are getting as input,
>>> >> but they still manage to extract a lot of good quality captions out of
>>> >> them, so your "garbage in - garbage out" argument wouldn't hold for
>>> >> YouTube. Note, however, that YouTube does a lot more than what we have
>>> >> codified into the WebVTT rendering algorithm.
>>> >>
>>> >>
>>> >>>>>TTML is a **markup** language. It is intended to contain the
>>> >>>>> necessary structure to convey the intention of an author as to how text
>>> >>>>> should appear timed against external content. It does NOT define a specific
>>> >>>>> rendering implementation, the referenced rendering aspect is illustrative of
>>> >>>>> the specification, and any rendering implementation is permitted.
>>> >>
>>> >> When defining a markup language, but not defining the means of
>>> >> rendering, you allow rendering devices the freedom to interpret the
>>> >> markup differently, thus leading to different visual experiences.
>>> >> Surely that is not the a good thing.
>>> >>
>>> >>
>>> >>>>>This has been the case since inception (over 10 years). It has been
>>> >>>>> unequivocal how TTML should be interpreted, (barring a few corner cases that
>>> >>>>> are well documented and will be resolved in the next edition).
>>> >>
>>> >> WebVTT is in the same position - we're also sorting out some corner
>>> >> cases.
>>> >>
>>> >>
>>> >>>>> BTW. SMPTE-TT has more to say about practical rendering
>>> >>>>> implementations in the captioning sense than TTML. For many of the use cases
>>> >>>>> that TTML was intended, it is much further along than WebVTT.
>>> >>
>>> >> Can you point out which use cases TTML is ahead of WebVTT? I'd like to
>>> >> understand what shortcomings there are so we can make sure to cover
>>> >> all use cases, or clarify any misunderstandings.
>>> >>
>>> >>
>>> >>>>> I stand by my ("half-finished strawman") statement. I have followed
>>> >>>>> the public **incremental** development of the WebVTT standard.
>>> >>
>>> >> That's how all standards are written.
>>> >>
>>> >>
>>> >>>>> I have had no inclination to attempt implementation against a moving
>>> >>>>> target. All formats do not evolve to support more features. The better the
>>> >>>>> requirements analysis and scoping phase is, the less radical evolution is
>>> >>>>> required in the specification. Writing the spec should be the easy part -
>>> >>>>> working out what to put in it is the difficult trick. By comparison to
>>> >>>>> WebVTT, TTML had a long gestation, but the published standard was IMHO
>>> >>>>> clearer and has certainly not evolved so much since publication.
>>> >>
>>> >>  You might want to check back with the beginnings of TTML to an email
>>> >> about "Iterating toward a solution":
>>> >> http://lists.w3.org/Archives/Public/public-tt/2003Feb/0039.html . That
>>> >> was in February 2003 - and TTML is still fixing bugs. That's
>>> >> continuous incremental improvement and it's the norm with all
>>> >> specifications that continue to be in active use and adapt to reality,
>>> >> which is a good thing.
>>> >>
>>> >>
>>> >>
>>> >>>>> I don't disagree. But my comment was more about why it seemed
>>> >>>>> necessary to develop a standard that effectively contests some of the same
>>> >>>>> space as TTML? Especially when TTML was already well formed and published at
>>> >>>>> the time that WebVTT was conceived? If WebVTT had been positioned and
>>> >>>>> defined as a rendering environment for TTML (which is now being discussed)
>>> >>>>> we would not be having this discussion.
>>> >>
>>> >> I'm not going there - that was a decision of the browsers that made
>>> >> after looking at TTML. It's history and we can't change it any more.
>>> >>
>>> >>
>>> >>> You may have missed that there is an actual spec for this:
>>> >>>
>>> >>> https://dvcs.w3.org/hg/text-tracks/raw-file/default/608toVTT/608toVTT.html
>>> >>> Other conversions are planned, but have not been required yet.
>>> >>>
>>> >>>>> I must have missed the announcement last week! ;-) BTW, from an
>>> >>>>> admittedly cursory look I have reservations about mapping 608 row positions
>>> >>>>> to (recurring) fractional percentages. The potential problems this can
>>> >>>>> create is one of the reasons why the TTML standard includes a cell
>>> >>>>> positioning concept.
>>> >>
>>> >> It has been around for at least a year and I've been pointing people
>>> >> toward it. The fractional percentage is simply the outcome of
>>> >> converting the CEA608 columns to exact percentages on the video.
>>> >>
>>> >>
>>> >>>> There does not seem to be a huge awareness of the role of a
>>> >>>> professional captioner or subtitler. Or of the role of commercial subtitling
>>> >>>> and captioning organisations, or of the existence of (internal) quality
>>> >>>> standards for caption / subtitling services that are adopted (insisted upon)
>>> >>>> by those organisations.
>>> >>>> The professional captioning and subtitling profession is largely
>>> >>>> ignorant of WebVTT.
>>> >>>
>>> >>> If this statements implies that professional captioning and subtitling
>>> >>> organisations are ignoring WebVTT, then you may have overlooked that some
>>> >>> are already supporting it and others are keeping a close eye.
>>> >>> They don't seem to be making a big fuss about it though. For example:
>>> >>>
>>> >>> http://www.cpcweb.com/webcasts/webcast_samples.htm#WebVTT
>>> >>>
>>> >>> http://www.automaticsync.com/captionsync/captionsync-delivers-webvtt-output/
>>> >>> http://www.synchrimedia.com/
>>> >>>
>>> >>> http://www.longtailvideo.com/support/jw-player/29360/basic-vtt-captions/
>>> >>>
>>> >>> http://www.wowza.com/forums/content.php?498-How-to-stream-WebVTT-subtitles-to-iOS-for-closed-captioning
>>> >>>
>>> >>>>> Most of the organisations you mention are not captioning or
>>> >>>>> subtitling companies operating in the TV / Film / Content creation
>>> >>>>> marketplace. They are mostly organisations involved in the
>>> >>>>> **redistribution** of media (excluding CPC). Captioning and subtitling (as
>>> >>>>> creative activities) takes place at (or on behalf of) content owners /
>>> >>>>> creators as well as at re-distributors. It is this former (professional
>>> >>>>> level) authoring community that I do not believe WebVTT is connected with.
>>> >>
>>> >> CPC is captioning for the TV market FAIK. TV & Film companies don't
>>> >> create captions themselves but get them made by captioning companies.
>>> >> They buy the formats that they need and as long as they publish to TV
>>> >> & Film - not the Web - they don't need WebVTT.
>>> >>
>>> >>
>>> >>
>>> >>>>>Captions should be positioned, styled and timed using a concise,
>>> >>>>> structured and partitioned framework. It should not be necessary to have an
>>> >>>>> in depth knowledge of an arcane set of rules in order to achieve these
>>> >>>>> requirements.
>>> >>
>>> >> Right. WebVTT has a very clear approach to how to position, style and
>>> >> time captions - I don't see the problem.
>>> >>
>>> >>
>>> >>>>> My biggest reservations about WebVTT are that it appears that it is
>>> >>>>> being promoted as a container for subtitling and caption content at the
>>> >>>>> **authoring and archive** level.
>>> >>
>>> >> WebVTT is a captioning format for the Web - that's all. Nobody is
>>> >> promoting it for anything else. If companies see a need to archive
>>> >> content in this format, I wouldn't have any problem with that. Why
>>> >> would that be a problem for you?
>>> >>
>>> >>
>>> >>>>>> In truth I have no problem with WebVTT as a delivery format to be
>>> >>>>>> interpreted by a browser or agent, although clearly I would prefer that
>>> >>>>>> there was only one such format. However, WebVTT is late in the game, and it
>>> >>>>>> does not IMHO address the requirements of authoring and archive. This may be
>>> >>>>>> due to a lack of appreciation of the number of phases that subtitle and
>>> >>>>>> caption content goes through in a 'professional' broadcast environment. Like
>>> >>>>>> video, subtitles and captions exist in different 'silos' and are transformed
>>> >>>>>> (often repeatedly) depending on the final target application. The US
>>> >>>>>> captioning model (that of captioning near output or creating caption master
>>> >>>>>> tapes) is NOT representative of captioning globally, nor is it at all
>>> >>>>>> representative of subtitling (multiple language translation) workflows.
>>> >>
>>> >>
>>> >> WebVTT was built for an international market and has taken such
>>> >> requirements on board - it even supported <ruby> before TTML
>>> >> introduced it. It had to do so because the Web is a global phenomenon.
>>> >> Some features were driven later by US law, but the initial target was
>>> >> always an international use.
>>> >>
>>> >>
>>> >>>>> I hope that clarifies my reservations about WebVTT.
>>> >>
>>> >> Yes, thanks. I don't believe I will be able to change your mind about
>>> >> WebVTT, so I'll just have to focus on improving it to meet your bar.
>>> >> :-)
>>> >>
>>> >> Cheers,
>>> >> Silvia.
>>> >>
>>> >
>>>
Received on Sunday, 16 June 2013 00:04:00 UTC