- From: r12a <ishida@w3.org>
- Date: Fri, 10 Aug 2018 07:55:55 +0100
- To: indic <public-i18n-indic@w3.org>
https://www.w3.org/2018/08/10-ilreq-minutes.html text extract follows: - DRAFT - India International Program Teleconference 10 Aug 2018 Attendees Present muthu, Akshat, Neha, alolita, r12a, vivek Regrets Chair Alolita Scribe r12a Contents * [2]Topics 1. [3]Agenda and Minutes 2. [4]Review of any pending action items 3. [5]Discussion of comments in issues posted on GitHub for Devanagari, Bengali and Tamil. * [6]Summary of Action Items * [7]Summary of Resolutions __________________________________________________________ Agenda and Minutes alolita: review new issues, look at action items, start on comments Review of any pending action items <alolita> [8]https://www.w3.org/International/groups/indic-layout/track/a ctions/open [8] https://www.w3.org/International/groups/indic-layout/track/actions/open <scribe> A6: Complete devanagari and bengali sections 2.6 Complete devanagari and bengali sections 2.6 PENDING Vivek to provide information about Bengali A8: Add issue about zwj/zwnj stuff to begin fleshing out the problem [9]https://github.com/w3c/iip/issues/14 [9] https://github.com/w3c/iip/issues/14 close action-8 <trackbot> Closed action-8. A9: Add issue about devanagari numerals to help provide use case examples [10]https://github.com/w3c/iip/issues/15 [10] https://github.com/w3c/iip/issues/15 vivek: native numerals are sometimes used, but i'm unable to see this as a gap - there is a straightforward mapping akshat: on previous call we discussed initial text and there was some confusion about what Vivek was trying to say <alolita> akshat: there is w3c css spec support for calendar, date support in Devanagari and Latin akshat: if i want to choose a devanagari calendar it should be not dependent on the developer, but specified by w3c ... there's some confusion about what is being said close action-9 <trackbot> Closed action-9. alolita: please all add to the github issue A10: Add text to 2.8 about general problems for segmentation PENDING akshat: we'll add that today Discussion of comments in issues posted on GitHub for Devanagari, Bengali and Tamil. alolita: there was some issue about adding Muthu's comments to github, so we should add after muthu: i pointed out some areas that need attention <alolita> muthu: there are 4 locales for tamil <alolita> muthu: for locale ta_MY and ta_sg - a Latin oriented format is used for numerals 2.7 Numbers, dates, etc The usage of Tamil numerals has fallen out of common usage, though we do find them used occasionally by a few. ASCII numerals are used in common practice, and thus should be the default or fallback when there are no options available. ta_my and ta_sg follows the English number format (123,456,789,000) and do not follow the number format used in ta_in and ta_lk. ]] alolita: tamil numerals only used in classical texts? muthu: correct ... i think also in malayalam and telugu not used ... in kannada they are used alolita: should both be available? muthu: generally only ascii needed, but it would be nice to have an option for users to use native numbers ... there are some who may want to read in tamil numerals alolita: how about in calendars? muthu: all ascii for tamil <alolita> richard: originally tamil did not have a zero richard: if people want to use tamil numerals would they use per a decimal based system muthu: yes <alolita> muthu: the old books from 50+ years have tamil numerals neha: if i want to display numbers in tamil there should be some tag to change numbers to tamil muthu: if such a tag is not provided, then ascii numbers should be used neha: that is a gap right now - no tag to switch to tamil numerals alolita: we have noted that there is a gap that needs to be addressed muthu: all the ta locales are the same, including sinhalese 2.8 Text boundaries & selection There are only two sequence of characters that form conjuncts in Tamil. Both are not native to Tamil. ஶ்ரீ and க்ஷ. Other than these two, no other CHC combinations form conjuncts. We should be able to place the cursor between the H and C (eg: CH<cursor>C). This issue was fixed in Android Oreo and iOS 12. The problem exists in many places and needs checking to identify which browsers support and which do not. ]] alolita: if you want to translate a historical text into tamil how will it be translated? with or without conjuncts? muthu: in modern languages they write phonetically and pulli remains visible r12a: [11]https://github.com/w3c/ilreq/issues/31 is a related issue [11] https://github.com/w3c/ilreq/issues/31 <scribe> ACTION: r12a to raise tamil segmentation issue in our repo <trackbot> Created ACTION-11 - Raise tamil segmentation issue in our repo [on Richard Ishida - due 2018-08-17]. alolita: so this issue is fixed in recent platforms - you can now put the cursor between muthu: yes neha: the segmentation rules for akshara @@@ [12]https://w3c.github.io/ilreq/#h_indic_orthographic_syllable_ boundaries [12] https://w3c.github.io/ilreq/#h_indic_orthographic_syllable_boundaries vivek: tamil doesn't fall in line with other scripts for handling of clusters muthu summarises neha and akshat muthu: ilreq has already specified the halant cluster model - vivek is saying that doesn't cover tamil because it's a different <alolita> akshat: there are 2 definitions of akshara <alolita> akshat: one definition refers to one encoding for all indian scripts <alolita> akshat: this is the IS13194 definition akshat: there are two actual definitions today, iscii 1394 list all conjuncts <alolita> akshat: the other definition is from unicode akshat: when unicode came around it broke away individual scripts into separate code pages, unlike iscii, <alolita> akshat: unicode instead allocated different code pages for each indian language script <alolita> akshat: in the ilreq document, the scripts and segmentation definitions are not clear akshat: ilreq doc is unicode specific but doesn't clarify in terms of what scripts are supported - the definition is oriented towards devanagari languages, except for santali ... but bengali, malayalam, gurmukhi requirements are not captured by ilreq ... for tamil we don't need new categories to add to this definition ... definition talks about CHC but in tamil it's only applicable for the two conjuncts alolita: going back to muthu and vivek, there should be a clear definition for tamil so that can be used as foundation for unicode ... having the clarification of differences is needed - that's a gap 2.10.1 Syllable/Akshara spacing Need to understand what is meant by: Consonant+Matra+Matra, the breaking seems to stack ill formed akshara into one set instead of clearly breaking it separate. This breaking behaviour needs to improve. Consonant+Matra+Matra is valid in Tamil ]] <alolita> vivek: eastern scripts (bengali, oriya) and southern scripts both support split matras vivek: describes use of matras... ... this is a massive bug and common to most of our languages <alolita> vivek: there is no clear definition - consonant+recursive-matras should not be allowed <alolita> ... the unicode spec should be corrected to reflect this <alolita> consonant+matra+matra is allowed in unicode akshat: when you say that multiple matras are allowed in unicode - is this application specific ? <alolita> ... open type also supports this unicode definition akshat: that's an implementation issue rather than unicode issue vivek: please point to the part of the unicode standard that describes this <alolita> akshat: clear rules for syllable boundaries need to be defined akshat: whatever unicode says is in ch12 but doesn't specific what should join and what not <C-DAC_GIST> [13]http://unicode.org/versions/Unicode8.0.0/ch12.pdf [13] http://unicode.org/versions/Unicode8.0.0/ch12.pdf <scribe> ACTION: Muthu (and Vivek) to verify the definitions in ch12 for Tamil <trackbot> Created ACTION-12 - (and vivek) to verify the definitions in ch12 for tamil [on Muthu Nedumaran - due 2018-08-17]. muthu: I raised this because it is stated as ill-formed but i don't think that is corret akshat: upshot is lack of clarity of askshara definition <scribe> ACTION: Alolita to convert Muthu's comments to github issues <trackbot> Created ACTION-13 - Convert muthu's comments to github issues [on Alolita Sharma - due 2018-08-17]. 2.12.1 Underline and Overline behaviour Tamil and other south Indian scripts do not have a shirorekha or line below as in Devanagari. The underline should match that of Latin in a bilingual (or dual script) document, which is more common in Malaysia and Singapore. However, it needs to align with the underline of Devanagari when it combines with Hindi or Sanskrit. ]] muthu: malaysia and singapore use tamil and documents with tamil and latin on same line, the underline should be at the same place for both ... all tamil fonts include latin glyphs too, so the issue doesn't arise so much alolita: this issue would arise in india, esp in publishing with mixed scripts ... so gap is that rules don't exist for what should happen for position of underline and overline r12a: recommend that we look at the CSS Text module and check whether it addresses these issues [14]https://drafts.csswg.org/css-text-decor-3/#line-decoration [14] https://drafts.csswg.org/css-text-decor-3/#line-decoration [15]http://w3c.github.io/typography/#text_decoration [15] http://w3c.github.io/typography/#text_decoration <scribe> ACTION: Alolita (all) to review CSS specification for features <trackbot> Created ACTION-14 - (all) to review css specification for features [on Alolita Sharma - due 2018-08-17]. 3.1 and 3.2 Line breaking and hyphenation There are some simple rules for line breaking. Different people use different implementations. However, I can’t find a decent document for this online. Here’s a paper presented at a conference held in Singapore: [16]https://www.academia.edu/671796/Tamil_Hyphenator_P._David_P rabhakar. The First 3 rules in the section Rules for Tamil Hyphenation is a good start. [16] https://www.academia.edu/671796/Tamil_Hyphenator_P._David_Prabhakar. ]] muthu: how do we frame the issue here ? r12a: the gap would be that hyphenation is not happening for users in browsers, then the next step would be to ask why vivek: cdac has rules for many languages and this may be available (though maybe not fully comprehensive) but could be a useful resource discussion about how to find the information <scribe> ACTION: Akshat do a general search in CDAC for original rule book for hypenation in 11 scrpts <trackbot> Error creating an ACTION: could not connect to Tracker. Please mail <sysreq@w3.org> with details about what happened. <scribe> ACTION: Akshat to do a general search in CDAC for original rule book for hypenation in 11 scrpts <trackbot> Created ACTION-16 - Do a general search in cdac for original rule book for hypenation in 11 scrpts [on Akshat Joshi - due 2018-08-17]. 3.4 Counters, lists, etc Need to understand : the other relies on the user-defined mechanism specified in that spec in order to be applied. Shouldn’t the default be ASCII numerals and, Tamil numerals be user defined? ]] 3.5 Initial letter styling Need to be mindful of conjuncts as defined in 2.8 above. ]] 3.7 Other paragraph features Tamil can start a paragraph with or without indents. Paragraph features are the same as English. ]] <alolita_> yes Meeting adjourned Next meeting: two weeks Summary of Action Items [NEW] ACTION: Akshat do a general search in CDAC for original rule book for hypenation in 11 scrpts [NEW] ACTION: Alolita (all) to review CSS specification for features [NEW] ACTION: Alolita to convert Muthu's comments to github issues [NEW] ACTION: Muthu (and Vivek) to verify the definitions in ch12 for Tamil [NEW] ACTION: r12a to raise tamil segmentation issue in our repo Summary of Resolutions [End of minutes]
Received on Friday, 10 August 2018 06:56:01 UTC