potential liaison to Unicode re: addition of subtitle/captioning character sets to CLDR from Pierre-Anthony Lemieux on 2015-06-25 (public-tt@w3.org from June 2015)

From: Pierre-Anthony Lemieux <pal@sandflow.com>
Date: Wed, 24 Jun 2015 20:33:59 -0700
To: "public-tt@w3.org" <public-tt@w3.org>
Message-ID: <CAF_7JxBtJhF+oq-b4JuXdEE7LrjAwNQpyER=sFjgMCAGZs86KQ@mail.gmail.com>

Hi all,

In preparation for our call and as discussed previously, below is a
draft for a potential liaison to Unicode suggesting the addition of
subtitle/captioning character sets to CLDR.

Looking forward to the discussion.

Best,

-- Pierre

"""
The W3C Timed Text Working Group (TTWG) [1] develops specifications
for subtitle and caption delivery worldwide, including dialog language
translation, content description, captions for deaf and hard of
hearing, etc. It has, in the process, collected sets of characters
(for selected locales) that have proven useful in practice for
subtitling and captioning. These sets, documented at [2], are derived
in part from the analysis of home video content.

[1] http://www.w3.org/AudioVideo/TT/
[2] https://dvcs.w3.org/hg/ttml/raw-file/tip/ttml-ww-profiles/ttml-ww-profiles.html#recommended-unicode-code-points-per-language

The CLDR Core Data specifies sets of commonly used letters and
punctuation (main, punctuation, numbering system...) for invididual
locales. The TTWG notes that these sets do not include all characters
used in practice for subtitling/captioning, e.g. the QUARTER NOTE
(U+2669) character.

TTWG suggests that Unicode consider adding to CLDR sets of characters
useful for subtitling and captioning applications. These sets would
evolve as new locales are added and existing locales are refined, and
could be referenced by TTWG and other organizations, enhancing the
chances that subtitles/captions are presented correctly across
systems.

The page at [3] details the suggested subtitle/captioning characters
sets for a number of selected locale. Each set is a superset of the
CLDR main, punctuation and numbers sets for the given locale. For
reference, blue-shaded cells indicate characters that are already
included in the latter. While it is possible to produce sets that
exclude CLDR main, punctuation and numbers sets, such sets are
probably more difficult to review.

[3] http://sandflow.com/public/cldr/imsc-codepoint-table.htm

TTWG is available to provide additional information and looks forward
to hearing from, and working with, the Unicode consortium.
"""

Received on Thursday, 25 June 2015 03:34:49 UTC