Re: liaison for a Unicode ticket from Pierre-Anthony Lemieux on 2016-12-05 (public-tt@w3.org from December 2016)

From: Pierre-Anthony Lemieux <pal@sandflow.com>
Date: Sun, 4 Dec 2016 22:28:15 -0800
To: Shervin Afshar <safshar@netflix.com>
Cc: "public-tt@w3.org" <public-tt@w3.org>, Thierry Michel <tmichel@w3.org>, Nigel Megitt <nigel.megitt@bbc.co.uk>, Richard Ishida <ishida@w3.org>, Mark Davis <mark@macchiato.com>, Steven R Loomis <srloomis@us.ibm.com>
Message-ID: <CAF_7JxCd1r2MF-oC-bDAJi8Ns5zL7hS_S4E_85akg=n3hmgnCA@mail.gmail.com>

Hi Shervin,

Thanks for getting back to us. My input below.

> – Clarification on the intended usage of this data with regards to section 7.2 and Appendix B of TTML-IMSC1

The recommended sets specified in IMSC1 are intended to:
- encourage authors targeting a specific language to use only those
characters included in the set associated with the language
- encourage device suppliers to support all the characters listed for
each language their device claims to support

>  e.g. inclusion/exclusion rationale

The sets were derived from the analysis of subtitle content from
worldwide home video titles, i.e. Blu Ray and DVD, and 608/708
captioning systems.

> rationale for selection of "base" set;

The characters listed in the "Common Character Set" (Table 1) were
found to be generally useful in subtitles across all languages.

> Comparison between sets in proposed draft data and existing CLDR exemplar types (main, aux, punctuation) in various locales;

The file at [1] lists the characters that are included in Table 2 of
IMSC1, but not included in the union of (i) the main, auxiliary,
punctuation exemplarCharacters and (ii) symbols and
defaultNumberingSystem characters. [ed.: the significant differences
can be traced to the inclusion of the entire Latin Extended-A block
and significant portions of the Cyrillic block for selected European
sets.].

[1] http://www.sandflow.com/public/CLDR-report-20161204.txt

> – Plans for providing data for other locales.

I would think data for other locales (and updates to existing locales)
would come from multiple sources:

- directly from Unicode participants
- from W3C TTWG and groups in the course of developing
subtitling/captioning specifications

In particular, the next major revision of IMSC1 is thought to add
significant CJK capabilities (beyond images), which may result in data
being provided for these locales.

I have updated Ticket #8915 with similar information, and am available
to participate in future meetings, as needs be.

Best,

-- Pierre

On Sun, Dec 4, 2016 at 4:41 PM, Shervin Afshar <safshar@netflix.com> wrote:
> Hello,
>
> CLDR ticket #8915 was discussed in last technical committee meeting. We
> think that this use-case falls within the scope of CLDR project, but to
> effectively add this data to benefit implementers and users, there are few
> issues which need to be addressed. Most of these questions are reflected in
> the comment that Mark provided on the ticket (direct link). To summarize,
> the following items should be addressed and discussed:
>
> – Clarification on the intended usage of this data with regards to section
> 7.2 and Appendix B of TTML-IMSC1; e.g. inclusion/exclusion rationale,
> rationale for selection of "base" set;
> – Comparison between sets in proposed draft data and existing CLDR exemplar
> types (main, aux, punctuation) in various locales;
> – Plans for providing data for other locales.
>
> Best regards,
> Shervin
>>
>> ----- Original message -----
>> From: r12a <ishida@w3.org>
>> To: Mark Davis <mark@macchiato.com>, Shervin Afshar
>> <shervinafshar@gmail.com>, Steven R Loomis/Cupertino/IBM@IBMUS
>> Cc: Thierry MICHEL <tmichel@w3.org>, W3C Public TTWG <public-tt@w3.org>
>> Subject: Re: liaison for a Unicode ticket
>> Date: Tue, Nov 8, 2016 3:43 AM
>>
>> hi Mark, Shervin, Steve,
>>
>> It has been thirteen months since there was movement on this query.
>> Could one of you please contact Thierry and advise him on how/whether
>> it's possible to move forward the request of the Timed Text WG?
>>
>> thanks,
>> ri
>>
>>
>>
>> On 03/11/2016 17:38, Thierry MICHEL wrote:
>> > Richard,
>> >
>> >
>> > The TTWG as a Unicode ticket for adding the following "CLDR supplemental
>> > data for subtitle and caption characters"
>> >
>> > The Unicode ticket is available at
>> > http://unicode.org/cldr/trac/ticket/8915
>> >
>> > There has been no further notes on this for 7 months since
>> >  IMSC1 has been published as a Recommendation
>> > (https://www.w3.org/TR/ttml-imsc1/)
>> >
>> >
>> > Could you please help the TTWG to lease with Unicode to allow moving
>> > forward ?
>> >
>> > I guess Mark Davis is the liaison contact for Unicode.
>> >
>> > Thierry.
>> >
>>

Received on Monday, 5 December 2016 06:29:16 UTC