Re: liaison for a Unicode ticket from Pierre-Anthony Lemieux on 2016-12-22 (public-tt@w3.org from December 2016)

From: Pierre-Anthony Lemieux <pal@sandflow.com>
Date: Wed, 21 Dec 2016 23:22:08 -0800
To: Shervin Afshar <safshar@netflix.com>
Cc: Thierry MICHEL <tmichel@w3.org>, Nigel Megitt <nigel.megitt@bbc.co.uk>, Richard Ishida <ishida@w3.org>, Mark Davis <mark@macchiato.com>, Steven R Loomis <srloomis@us.ibm.com>, "public-tt@w3.org" <public-tt@w3.org>
Message-ID: <CAF_7JxAkrW_ir4VCRpHqgPt=wSQS_pT=NdCGM7wLUhMo1L7mJw@mail.gmail.com>
Hi Shervin,

Thanks for the update, and to the CLDR TC for considering the input.

> In some other cases it's not very clear if the inclusion of a specific characters is justified or simply due to
> bad data (e.g. u+017F, LATIN SMALL LETTER LONG S which is included in the set latnExtA provided in [2]).

I believe that the recommended sets erred on the side of caution, and
were created to deliberately cast a wider, rather than narrower, net
whenever possible. For instance, the recommended set for each of the
"lv,lt,et,hr,cs,pl,sl,sk,tr" locales includes all of the Latin
Extended-A block, instead of attempting to optimize each sets at the
risk of missing important characters -- the general assumption being
that the incremental complexity of supporting all versus parts of the
Latin Extended-A block would be marginal, e.g. implementations support
all or none of the Latin Extended-A block.

> I will update the thread and the ticket with the next steps when I get to check for anomalies of that sort.

Looking forward to your feedback.

Best,

-- Pierre

On Mon, Dec 12, 2016 at 12:50 PM, Shervin Afshar <safshar@netflix.com> wrote:
> Thanks for the new comparison report[1]. CLDR TC discussed this again last
> week and looking at the report, it seems that in some cases the issue can be
> addressed by adding characters to one of CLDR exemplar categories for the
> respective locale; e.g. for Arabic, U+060D (Arabic date separator) or for
> Hebrew, U+05C3 (Sof Pasuq). In some other cases it's not very clear if the
> inclusion of a specific characters is justified or simply due to bad data
> (e.g. u+017F, LATIN SMALL LETTER LONG S which is included in the set
> latnExtA provided in [2]).
>
> Therefore, a closer inspection of each set seems necessary. I will update
> the thread and the ticket with the next steps when I get to check for
> anomalies of that sort.
>
> [1]: http://www.sandflow.com/public/CLDR-report-20161204.txt
> [2]:
> https://dvcs.w3.org/hg/ttml/raw-file/bc0f3b1a9104/ttml-ww-profiles/cldr-supplemental-data/cldr-sub-cap-supplemental-data.xml
>
> Best regards,
> Shervin
>
> On Fri, Dec 9, 2016 at 1:14 AM, Thierry MICHEL <tmichel@w3.org> wrote:
>>
>> Hello,
>>
>> The TTWG provided feedback on the
>> CLDR ticket #8915 <http://unicode.org/cldr/trac/ticket/8915>
>>
>> Looking forward to your review,
>>
>> Best regards,
>> Thierry Michel
>>
>>
>> Le 05/12/2016 à 01:41, Shervin Afshar a écrit :
>>>
>>> Hello,
>>>
>>> CLDR ticket #8915 <http://unicode.org/cldr/trac/ticket/8915> was
>>> discussed in last technical committee meeting. We think that this
>>> use-case falls within the scope of CLDR project, but to effectively add
>>> this data to benefit implementers and users, there are few issues which
>>> need to be addressed. Most of these questions are reflected in the
>>> comment that Mark provided on the ticket (direct link
>>> <http://unicode.org/cldr/trac/ticket/8915#comment:8>). To summarize, the
>>> following items should be addressed and discussed:
>>>
>>> – Clarification on the intended usage of this data with regards to
>>> section 7.2 and Appendix B of TTML-IMSC1; e.g. inclusion/exclusion
>>> rationale, rationale for selection of "base" set;
>>> – Comparison between sets in proposed draft data
>>>
>>> <https://dvcs.w3.org/hg/ttml/raw-file/bc0f3b1a9104/ttml-ww-profiles/cldr-supplemental-data/cldr-sub-cap-supplemental-data.xml>
>>> and
>>> existing CLDR exemplar types (main, aux, punctuation) in various locales;
>>> – Plans for providing data for other locales.
>>>
>>> Best regards,
>>> Shervin
>>>
>>>         ----- Original message -----
>>>         From: r12a <ishida@w3.org <mailto:ishida@w3.org>>
>>>         To: Mark Davis <mark@macchiato.com
>>>         <mailto:mark@macchiato.com>>, Shervin Afshar
>>>         <shervinafshar@gmail.com <mailto:shervinafshar@gmail.com>>,
>>>         Steven R Loomis/Cupertino/IBM@IBMUS
>>>         Cc: Thierry MICHEL <tmichel@w3.org <mailto:tmichel@w3.org>>, W3C
>>>         Public TTWG <public-tt@w3.org <mailto:public-tt@w3.org>>
>>>         Subject: Re: liaison for a Unicode ticket
>>>         Date: Tue, Nov 8, 2016 3:43 AM
>>>
>>>         hi Mark, Shervin, Steve,
>>>
>>>         It has been thirteen months since there was movement on this
>>> query.
>>>         Could one of you please contact Thierry and advise him on
>>>         how/whether
>>>         it's possible to move forward the request of the Timed Text WG?
>>>
>>>         thanks,
>>>         ri
>>>
>>>
>>>
>>>         On 03/11/2016 17:38, Thierry MICHEL wrote:
>>>         > Richard,
>>>         >
>>>         >
>>>         > The TTWG as a Unicode ticket for adding the following "CLDR
>>>         supplemental
>>>         > data for subtitle and caption characters"
>>>         >
>>>         > The Unicode ticket is available at
>>>         > http://unicode.org/cldr/trac/ticket/8915
>>>         <http://unicode.org/cldr/trac/ticket/8915>
>>>         >
>>>         > There has been no further notes on this for 7 months since
>>>         >  IMSC1 has been published as a Recommendation
>>>         > (https://www.w3.org/TR/ttml-imsc1/
>>>         <https://www.w3.org/TR/ttml-imsc1/>)
>>>         >
>>>         >
>>>         > Could you please help the TTWG to lease with Unicode to allow
>>>         moving
>>>         > forward ?
>>>         >
>>>         > I guess Mark Davis is the liaison contact for Unicode.
>>>         >
>>>         > Thierry.
>>>         >
>>>
>>>
>
Received on Thursday, 22 December 2016 07:22:56 UTC