[minutes] IIP telecon 2018-08-10 from r12a on 2018-08-10 (public-i18n-indic@w3.org from July to September 2018)

From: r12a <ishida@w3.org>
Date: Fri, 10 Aug 2018 07:55:55 +0100
To: indic <public-i18n-indic@w3.org>
Message-ID: <3433e04f-a2ad-b8ee-5c5b-e6c7ecf24082@w3.org>
https://www.w3.org/2018/08/10-ilreq-minutes.html





text extract follows:


- DRAFT -

                India International Program Teleconference

10 Aug 2018

Attendees

    Present
           muthu, Akshat, Neha, alolita, r12a, vivek

    Regrets

    Chair
           Alolita

    Scribe
           r12a

Contents

      * [2]Topics
          1. [3]Agenda and Minutes
          2. [4]Review of any pending action items
          3. [5]Discussion of comments in issues posted on GitHub
             for Devanagari, Bengali and Tamil.
      * [6]Summary of Action Items
      * [7]Summary of Resolutions
      __________________________________________________________

Agenda and Minutes

    alolita: review new issues, look at action items, start on
    comments

Review of any pending action items

    <alolita>
    [8]https://www.w3.org/International/groups/indic-layout/track/a
    ctions/open

       [8] 
https://www.w3.org/International/groups/indic-layout/track/actions/open

    <scribe> A6: Complete devanagari and bengali sections 2.6

    Complete devanagari and bengali sections 2.6 PENDING

    Vivek to provide information about Bengali

    A8: Add issue about zwj/zwnj stuff to begin fleshing out the
    problem

    [9]https://github.com/w3c/iip/issues/14

       [9] https://github.com/w3c/iip/issues/14

    close action-8

    <trackbot> Closed action-8.

    A9: Add issue about devanagari numerals to help provide use
    case examples

    [10]https://github.com/w3c/iip/issues/15

      [10] https://github.com/w3c/iip/issues/15

    vivek: native numerals are sometimes used, but i'm unable to
    see this as a gap - there is a straightforward mapping

    akshat: on previous call we discussed initial text and there
    was some confusion about what Vivek was trying to say

    <alolita> akshat: there is w3c css spec support for calendar,
    date support in Devanagari and Latin

    akshat: if i want to choose a devanagari calendar it should be
    not dependent on the developer, but specified by w3c
    ... there's some confusion about what is being said

    close action-9

    <trackbot> Closed action-9.

    alolita: please all add to the github issue

    A10: Add text to 2.8 about general problems for segmentation
    PENDING

    akshat: we'll add that today


Discussion of comments in issues posted on GitHub for Devanagari,
Bengali and Tamil.

    alolita: there was some issue about adding Muthu's comments to
    github, so we should add after

    muthu: i pointed out some areas that need attention

    <alolita> muthu: there are 4 locales for tamil

    <alolita> muthu: for locale ta_MY and ta_sg - a Latin oriented
    format is used for numerals



    2.7 Numbers, dates, etc

    The usage of Tamil numerals has fallen out of common usage,
    though we do find them used occasionally by a few. ASCII
    numerals are used in common practice, and thus should be the
    default or fallback when there are no options available.

    ta_my and ta_sg follows the English number format
    (123,456,789,000) and do not follow the number format used in
    ta_in and ta_lk.

    ]]

    alolita: tamil numerals only used in classical texts?

    muthu: correct
    ... i think also in malayalam and telugu not used
    ... in kannada they are used

    alolita: should both be available?

    muthu: generally only ascii needed, but it would be nice to
    have an option for users to use native numbers
    ... there are some who may want to read in tamil numerals

    alolita: how about in calendars?

    muthu: all ascii for tamil

    <alolita> richard: originally tamil did not have a zero

    richard: if people want to use tamil numerals would they use
    per a decimal based system

    muthu: yes

    <alolita> muthu: the old books from 50+ years have tamil
    numerals

    neha: if i want to display numbers in tamil there should be
    some tag to change numbers to tamil

    muthu: if such a tag is not provided, then ascii numbers should
    be used

    neha: that is a gap right now - no tag to switch to tamil
    numerals

    alolita: we have noted that there is a gap that needs to be
    addressed

    muthu: all the ta locales are the same, including sinhalese



    2.8 Text boundaries & selection

    There are only two sequence of characters that form conjuncts
    in Tamil. Both are not native to Tamil. ஶ்ரீ and க்ஷ. Other
    than these two, no other CHC combinations form conjuncts. We
    should be able to place the cursor between the H and C (eg:
    CH<cursor>C). This issue was fixed in Android Oreo and iOS 12.
    The problem exists in many places and needs checking to
    identify which browsers support and which do not.

    ]]

    alolita: if you want to translate a historical text into tamil
    how will it be translated? with or without conjuncts?

    muthu: in modern languages they write phonetically and pulli
    remains visible

    r12a: [11]https://github.com/w3c/ilreq/issues/31 is a related
    issue

      [11] https://github.com/w3c/ilreq/issues/31

    <scribe> ACTION: r12a to raise tamil segmentation issue in our
    repo

    <trackbot> Created ACTION-11 - Raise tamil segmentation issue
    in our repo [on Richard Ishida - due 2018-08-17].

    alolita: so this issue is fixed in recent platforms - you can
    now put the cursor between

    muthu: yes

    neha: the segmentation rules for akshara @@@

    [12]https://w3c.github.io/ilreq/#h_indic_orthographic_syllable_
    boundaries

      [12] 
https://w3c.github.io/ilreq/#h_indic_orthographic_syllable_boundaries

    vivek: tamil doesn't fall in line with other scripts for
    handling of clusters

    muthu summarises neha and akshat

    muthu: ilreq has already specified the halant cluster model -
    vivek is saying that doesn't cover tamil because it's a
    different

    <alolita> akshat: there are 2 definitions of akshara

    <alolita> akshat: one definition refers to one encoding for all
    indian scripts

    <alolita> akshat: this is the IS13194 definition

    akshat: there are two actual definitions today, iscii 1394 list
    all conjuncts

    <alolita> akshat: the other definition is from unicode

    akshat: when unicode came around it broke away individual
    scripts into separate code pages, unlike iscii,

    <alolita> akshat: unicode instead allocated different code
    pages for each indian language script

    <alolita> akshat: in the ilreq document, the scripts and
    segmentation definitions are not clear

    akshat: ilreq doc is unicode specific but doesn't clarify in
    terms of what scripts are supported - the definition is
    oriented towards devanagari languages, except for santali
    ... but bengali, malayalam, gurmukhi requirements are not
    captured by ilreq
    ... for tamil we don't need new categories to add to this
    definition
    ... definition talks about CHC but in tamil it's only
    applicable for the two conjuncts

    alolita: going back to muthu and vivek, there should be a clear
    definition for tamil so that can be used as foundation for
    unicode
    ... having the clarification of differences is needed - that's
    a gap



    2.10.1 Syllable/Akshara spacing

    Need to understand what is meant by: Consonant+Matra+Matra, the
    breaking seems to stack ill formed akshara into one set instead
    of clearly breaking it separate. This breaking behaviour needs
    to improve.

    Consonant+Matra+Matra is valid in Tamil

    ]]

    <alolita> vivek: eastern scripts (bengali, oriya) and southern
    scripts both support split matras

    vivek: describes use of matras...
    ... this is a massive bug and common to most of our languages

    <alolita> vivek: there is no clear definition -
    consonant+recursive-matras should not be allowed

    <alolita> ... the unicode spec should be corrected to reflect
    this

    <alolita> consonant+matra+matra is allowed in unicode

    akshat: when you say that multiple matras are allowed in
    unicode - is this application specific ?

    <alolita> ... open type also supports this unicode definition

    akshat: that's an implementation issue rather than unicode
    issue

    vivek: please point to the part of the unicode standard that
    describes this

    <alolita> akshat: clear rules for syllable boundaries need to
    be defined

    akshat: whatever unicode says is in ch12 but doesn't specific
    what should join and what not

    <C-DAC_GIST>
    [13]http://unicode.org/versions/Unicode8.0.0/ch12.pdf

      [13] http://unicode.org/versions/Unicode8.0.0/ch12.pdf

    <scribe> ACTION: Muthu (and Vivek) to verify the definitions in
    ch12 for Tamil

    <trackbot> Created ACTION-12 - (and vivek) to verify the
    definitions in ch12 for tamil [on Muthu Nedumaran - due
    2018-08-17].

    muthu: I raised this because it is stated as ill-formed but i
    don't think that is corret

    akshat: upshot is lack of clarity of askshara definition

    <scribe> ACTION: Alolita to convert Muthu's comments to github
    issues

    <trackbot> Created ACTION-13 - Convert muthu's comments to
    github issues [on Alolita Sharma - due 2018-08-17].



    2.12.1 Underline and Overline behaviour

    Tamil and other south Indian scripts do not have a shirorekha
    or line below as in Devanagari. The underline should match that
    of Latin in a bilingual (or dual script) document, which is
    more common in Malaysia and Singapore. However, it needs to
    align with the underline of Devanagari when it combines with
    Hindi or Sanskrit.

    ]]

    muthu: malaysia and singapore use tamil and documents with
    tamil and latin on same line, the underline should be at the
    same place for both
    ... all tamil fonts include latin glyphs too, so the issue
    doesn't arise so much

    alolita: this issue would arise in india, esp in publishing
    with mixed scripts
    ... so gap is that rules don't exist for what should happen for
    position of underline and overline

    r12a: recommend that we look at the CSS Text module and check
    whether it addresses these issues

    [14]https://drafts.csswg.org/css-text-decor-3/#line-decoration

      [14] https://drafts.csswg.org/css-text-decor-3/#line-decoration

    [15]http://w3c.github.io/typography/#text_decoration

      [15] http://w3c.github.io/typography/#text_decoration

    <scribe> ACTION: Alolita (all) to review CSS specification for
    features

    <trackbot> Created ACTION-14 - (all) to review css
    specification for features [on Alolita Sharma - due
    2018-08-17].



    3.1 and 3.2 Line breaking and hyphenation

    There are some simple rules for line breaking. Different people
    use different implementations. However, I can’t find a decent
    document for this online. Here’s a paper presented at a
    conference held in Singapore:
    [16]https://www.academia.edu/671796/Tamil_Hyphenator_P._David_P
    rabhakar. The First 3 rules in the section Rules for Tamil
    Hyphenation is a good start.

      [16] 
https://www.academia.edu/671796/Tamil_Hyphenator_P._David_Prabhakar.

    ]]

    muthu: how do we frame the issue here ?

    r12a: the gap would be that hyphenation is not happening for
    users in browsers, then the next step would be to ask why

    vivek: cdac has rules for many languages and this may be
    available (though maybe not fully comprehensive) but could be a
    useful resource

    discussion about how to find the information

    <scribe> ACTION: Akshat do a general search in CDAC for
    original rule book for hypenation in 11 scrpts

    <trackbot> Error creating an ACTION: could not connect to
    Tracker. Please mail <sysreq@w3.org> with details about what
    happened.

    <scribe> ACTION: Akshat to do a general search in CDAC for
    original rule book for hypenation in 11 scrpts

    <trackbot> Created ACTION-16 - Do a general search in cdac for
    original rule book for hypenation in 11 scrpts [on Akshat Joshi
    - due 2018-08-17].



    3.4 Counters, lists, etc

    Need to understand : the other relies on the user-defined
    mechanism specified in that spec in order to be applied.

    Shouldn’t the default be ASCII numerals and, Tamil numerals be
    user defined?

    ]]



    3.5 Initial letter styling

    Need to be mindful of conjuncts as defined in 2.8 above.

    ]]



    3.7 Other paragraph features

    Tamil can start a paragraph with or without indents. Paragraph
    features are the same as English.

    ]]

    <alolita_> yes

    Meeting adjourned

    Next meeting: two weeks

Summary of Action Items

    [NEW] ACTION: Akshat do a general search in CDAC for original
    rule book for hypenation in 11 scrpts
    [NEW] ACTION: Alolita (all) to review CSS specification for
    features
    [NEW] ACTION: Alolita to convert Muthu's comments to github
    issues
    [NEW] ACTION: Muthu (and Vivek) to verify the definitions in
    ch12 for Tamil
    [NEW] ACTION: r12a to raise tamil segmentation issue in our
    repo

Summary of Resolutions

    [End of minutes]
Received on Friday, 10 August 2018 06:56:01 UTC