Re: A new proposal for how to deal with text track cues from Silvia Pfeiffer on 2013-06-15 (public-tt@w3.org from June 2013)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Sat, 15 Jun 2013 20:17:54 +1000
To: John Birch <John.Birch@screensystems.tv>
Cc: Glenn Adams <glenn@skynav.com>, public-tt <public-tt@w3.org>
Message-ID: <CAHp8n2=z5Qtxtokko38MQay7OY4gL2avEK7eYnOY0MbUsK2Mqg@mail.gmail.com>
On Fri, Jun 14, 2013 at 9:37 PM, John Birch <John.Birch@screensystems.tv> wrote:
> Hi Silvia,
>
> Thanks for your email... I've commented in-line below. (>>)
>
> As I state below, please do not misunderstand, I am not against the implementation of another subtitle / caption *output* format. I am concerned however about an output format that seeks to 'gloss over' the potential inadequacies of the caption / subtitle authoring. Captions and subtitles should never be considered 'second rate' ancillary content that can be fixed up by a 'clever' browser. For accessibility there is a clear ethical desire to have the best authored content. For translation, (where as much as 90% of the audience may need a quality translation experience) the commercial driver for high quality subtitles is even more important. Garbage in , garbage out. My primary concern with WebVTT is that far too much attention is being paid to supporting a 'garbage in' mentality.


I think you're still misunderstanding what WebVTT does. If a file is
of high quality and captions/subtitles are authored as to a high
standard (as I would expect from commercial entities), and the video
is being displayed at a sufficient size to display the authored
content as intended, the rendering algorithm will not do any, as you
call it, 'fix up'.

However, the browser has to do something when a line of text has to be
wrapped because the video's width is too small to render the text.
Also, the current spec will - in the unlikely event that several cues
are rendered at the same time and have been poorly authored to overlap
each other - try to move the cues slighly to make the text not
overlap. These are the only two situations in which the WebVTT
rendering algorithm will make any changes to the positioning of the
text.

Also, you might want to talk with the people at YouTube that have to
deal with a lot of garbage captions that they are getting as input,
but they still manage to extract a lot of good quality captions out of
them, so your "garbage in - garbage out" argument wouldn't hold for
YouTube. Note, however, that YouTube does a lot more than what we have
codified into the WebVTT rendering algorithm.


>>>TTML is a **markup** language. It is intended to contain the necessary structure to convey the intention of an author as to how text should appear timed against external content. It does NOT define a specific rendering implementation, the referenced rendering aspect is illustrative of the specification, and any rendering implementation is permitted.

When defining a markup language, but not defining the means of
rendering, you allow rendering devices the freedom to interpret the
markup differently, thus leading to different visual experiences.
Surely that is not the a good thing.


>>>This has been the case since inception (over 10 years). It has been unequivocal how TTML should be interpreted, (barring a few corner cases that are well documented and will be resolved in the next edition).

WebVTT is in the same position - we're also sorting out some corner cases.


>>> BTW. SMPTE-TT has more to say about practical rendering implementations in the captioning sense than TTML. For many of the use cases that TTML was intended, it is much further along than WebVTT.

Can you point out which use cases TTML is ahead of WebVTT? I'd like to
understand what shortcomings there are so we can make sure to cover
all use cases, or clarify any misunderstandings.


>>> I stand by my ("half-finished strawman") statement. I have followed the public **incremental** development of the WebVTT standard.

That's how all standards are written.


>>> I have had no inclination to attempt implementation against a moving target. All formats do not evolve to support more features. The better the requirements analysis and scoping phase is, the less radical evolution is required in the specification. Writing the spec should be the easy part - working out what to put in it is the difficult trick. By comparison to WebVTT, TTML had a long gestation, but the published standard was IMHO clearer and has certainly not evolved so much since publication.

 You might want to check back with the beginnings of TTML to an email
about "Iterating toward a solution":
http://lists.w3.org/Archives/Public/public-tt/2003Feb/0039.html . That
was in February 2003 - and TTML is still fixing bugs. That's
continuous incremental improvement and it's the norm with all
specifications that continue to be in active use and adapt to reality,
which is a good thing.



>>> I don't disagree. But my comment was more about why it seemed necessary to develop a standard that effectively contests some of the same space as TTML? Especially when TTML was already well formed and published at the time that WebVTT was conceived? If WebVTT had been positioned and defined as a rendering environment for TTML (which is now being discussed) we would not be having this discussion.

I'm not going there - that was a decision of the browsers that made
after looking at TTML. It's history and we can't change it any more.


> You may have missed that there is an actual spec for this:
> https://dvcs.w3.org/hg/text-tracks/raw-file/default/608toVTT/608toVTT.html
> Other conversions are planned, but have not been required yet.
>
>>> I must have missed the announcement last week! ;-) BTW, from an admittedly cursory look I have reservations about mapping 608 row positions to (recurring) fractional percentages. The potential problems this can create is one of the reasons why the TTML standard includes a cell positioning concept.

It has been around for at least a year and I've been pointing people
toward it. The fractional percentage is simply the outcome of
converting the CEA608 columns to exact percentages on the video.


>> There does not seem to be a huge awareness of the role of a professional captioner or subtitler. Or of the role of commercial subtitling and captioning organisations, or of the existence of (internal) quality standards for caption / subtitling services that are adopted (insisted upon) by those organisations.
>> The professional captioning and subtitling profession is largely ignorant of WebVTT.
>
> If this statements implies that professional captioning and subtitling organisations are ignoring WebVTT, then you may have overlooked that some are already supporting it and others are keeping a close eye.
> They don't seem to be making a big fuss about it though. For example:
>
> http://www.cpcweb.com/webcasts/webcast_samples.htm#WebVTT
> http://www.automaticsync.com/captionsync/captionsync-delivers-webvtt-output/
> http://www.synchrimedia.com/
> http://www.longtailvideo.com/support/jw-player/29360/basic-vtt-captions/
> http://www.wowza.com/forums/content.php?498-How-to-stream-WebVTT-subtitles-to-iOS-for-closed-captioning
>
>>> Most of the organisations you mention are not captioning or subtitling companies operating in the TV / Film / Content creation marketplace. They are mostly organisations involved in the **redistribution** of media (excluding CPC). Captioning and subtitling (as creative activities) takes place at (or on behalf of) content owners / creators as well as at re-distributors. It is this former (professional level) authoring community that I do not believe WebVTT is connected with.

CPC is captioning for the TV market FAIK. TV & Film companies don't
create captions themselves but get them made by captioning companies.
They buy the formats that they need and as long as they publish to TV
& Film - not the Web - they don't need WebVTT.



>>>Captions should be positioned, styled and timed using a concise, structured and partitioned framework. It should not be necessary to have an in depth knowledge of an arcane set of rules in order to achieve these requirements.

Right. WebVTT has a very clear approach to how to position, style and
time captions - I don't see the problem.


>>> My biggest reservations about WebVTT are that it appears that it is being promoted as a container for subtitling and caption content at the **authoring and archive** level.

WebVTT is a captioning format for the Web - that's all. Nobody is
promoting it for anything else. If companies see a need to archive
content in this format, I wouldn't have any problem with that. Why
would that be a problem for you?


>>>> In truth I have no problem with WebVTT as a delivery format to be interpreted by a browser or agent, although clearly I would prefer that there was only one such format. However, WebVTT is late in the game, and it does not IMHO address the requirements of authoring and archive. This may be due to a lack of appreciation of the number of phases that subtitle and caption content goes through in a 'professional' broadcast environment. Like video, subtitles and captions exist in different 'silos' and are transformed (often repeatedly) depending on the final target application. The US captioning model (that of captioning near output or creating caption master tapes) is NOT representative of captioning globally, nor is it at all representative of subtitling (multiple language translation) workflows.


WebVTT was built for an international market and has taken such
requirements on board - it even supported <ruby> before TTML
introduced it. It had to do so because the Web is a global phenomenon.
Some features were driven later by US law, but the initial target was
always an international use.


>>> I hope that clarifies my reservations about WebVTT.

Yes, thanks. I don't believe I will be able to change your mind about
WebVTT, so I'll just have to focus on improving it to meet your bar.
:-)

Cheers,
Silvia.
Received on Saturday, 15 June 2013 10:18:41 UTC