- From: r12a <ishida@w3.org>
- Date: Fri, 10 Aug 2018 07:55:55 +0100
- To: indic <public-i18n-indic@w3.org>
https://www.w3.org/2018/08/10-ilreq-minutes.html
text extract follows:
- DRAFT -
India International Program Teleconference
10 Aug 2018
Attendees
Present
muthu, Akshat, Neha, alolita, r12a, vivek
Regrets
Chair
Alolita
Scribe
r12a
Contents
* [2]Topics
1. [3]Agenda and Minutes
2. [4]Review of any pending action items
3. [5]Discussion of comments in issues posted on GitHub
for Devanagari, Bengali and Tamil.
* [6]Summary of Action Items
* [7]Summary of Resolutions
__________________________________________________________
Agenda and Minutes
alolita: review new issues, look at action items, start on
comments
Review of any pending action items
<alolita>
[8]https://www.w3.org/International/groups/indic-layout/track/a
ctions/open
[8]
https://www.w3.org/International/groups/indic-layout/track/actions/open
<scribe> A6: Complete devanagari and bengali sections 2.6
Complete devanagari and bengali sections 2.6 PENDING
Vivek to provide information about Bengali
A8: Add issue about zwj/zwnj stuff to begin fleshing out the
problem
[9]https://github.com/w3c/iip/issues/14
[9] https://github.com/w3c/iip/issues/14
close action-8
<trackbot> Closed action-8.
A9: Add issue about devanagari numerals to help provide use
case examples
[10]https://github.com/w3c/iip/issues/15
[10] https://github.com/w3c/iip/issues/15
vivek: native numerals are sometimes used, but i'm unable to
see this as a gap - there is a straightforward mapping
akshat: on previous call we discussed initial text and there
was some confusion about what Vivek was trying to say
<alolita> akshat: there is w3c css spec support for calendar,
date support in Devanagari and Latin
akshat: if i want to choose a devanagari calendar it should be
not dependent on the developer, but specified by w3c
... there's some confusion about what is being said
close action-9
<trackbot> Closed action-9.
alolita: please all add to the github issue
A10: Add text to 2.8 about general problems for segmentation
PENDING
akshat: we'll add that today
Discussion of comments in issues posted on GitHub for Devanagari,
Bengali and Tamil.
alolita: there was some issue about adding Muthu's comments to
github, so we should add after
muthu: i pointed out some areas that need attention
<alolita> muthu: there are 4 locales for tamil
<alolita> muthu: for locale ta_MY and ta_sg - a Latin oriented
format is used for numerals
2.7 Numbers, dates, etc
The usage of Tamil numerals has fallen out of common usage,
though we do find them used occasionally by a few. ASCII
numerals are used in common practice, and thus should be the
default or fallback when there are no options available.
ta_my and ta_sg follows the English number format
(123,456,789,000) and do not follow the number format used in
ta_in and ta_lk.
]]
alolita: tamil numerals only used in classical texts?
muthu: correct
... i think also in malayalam and telugu not used
... in kannada they are used
alolita: should both be available?
muthu: generally only ascii needed, but it would be nice to
have an option for users to use native numbers
... there are some who may want to read in tamil numerals
alolita: how about in calendars?
muthu: all ascii for tamil
<alolita> richard: originally tamil did not have a zero
richard: if people want to use tamil numerals would they use
per a decimal based system
muthu: yes
<alolita> muthu: the old books from 50+ years have tamil
numerals
neha: if i want to display numbers in tamil there should be
some tag to change numbers to tamil
muthu: if such a tag is not provided, then ascii numbers should
be used
neha: that is a gap right now - no tag to switch to tamil
numerals
alolita: we have noted that there is a gap that needs to be
addressed
muthu: all the ta locales are the same, including sinhalese
2.8 Text boundaries & selection
There are only two sequence of characters that form conjuncts
in Tamil. Both are not native to Tamil. ஶ்ரீ and க்ஷ. Other
than these two, no other CHC combinations form conjuncts. We
should be able to place the cursor between the H and C (eg:
CH<cursor>C). This issue was fixed in Android Oreo and iOS 12.
The problem exists in many places and needs checking to
identify which browsers support and which do not.
]]
alolita: if you want to translate a historical text into tamil
how will it be translated? with or without conjuncts?
muthu: in modern languages they write phonetically and pulli
remains visible
r12a: [11]https://github.com/w3c/ilreq/issues/31 is a related
issue
[11] https://github.com/w3c/ilreq/issues/31
<scribe> ACTION: r12a to raise tamil segmentation issue in our
repo
<trackbot> Created ACTION-11 - Raise tamil segmentation issue
in our repo [on Richard Ishida - due 2018-08-17].
alolita: so this issue is fixed in recent platforms - you can
now put the cursor between
muthu: yes
neha: the segmentation rules for akshara @@@
[12]https://w3c.github.io/ilreq/#h_indic_orthographic_syllable_
boundaries
[12]
https://w3c.github.io/ilreq/#h_indic_orthographic_syllable_boundaries
vivek: tamil doesn't fall in line with other scripts for
handling of clusters
muthu summarises neha and akshat
muthu: ilreq has already specified the halant cluster model -
vivek is saying that doesn't cover tamil because it's a
different
<alolita> akshat: there are 2 definitions of akshara
<alolita> akshat: one definition refers to one encoding for all
indian scripts
<alolita> akshat: this is the IS13194 definition
akshat: there are two actual definitions today, iscii 1394 list
all conjuncts
<alolita> akshat: the other definition is from unicode
akshat: when unicode came around it broke away individual
scripts into separate code pages, unlike iscii,
<alolita> akshat: unicode instead allocated different code
pages for each indian language script
<alolita> akshat: in the ilreq document, the scripts and
segmentation definitions are not clear
akshat: ilreq doc is unicode specific but doesn't clarify in
terms of what scripts are supported - the definition is
oriented towards devanagari languages, except for santali
... but bengali, malayalam, gurmukhi requirements are not
captured by ilreq
... for tamil we don't need new categories to add to this
definition
... definition talks about CHC but in tamil it's only
applicable for the two conjuncts
alolita: going back to muthu and vivek, there should be a clear
definition for tamil so that can be used as foundation for
unicode
... having the clarification of differences is needed - that's
a gap
2.10.1 Syllable/Akshara spacing
Need to understand what is meant by: Consonant+Matra+Matra, the
breaking seems to stack ill formed akshara into one set instead
of clearly breaking it separate. This breaking behaviour needs
to improve.
Consonant+Matra+Matra is valid in Tamil
]]
<alolita> vivek: eastern scripts (bengali, oriya) and southern
scripts both support split matras
vivek: describes use of matras...
... this is a massive bug and common to most of our languages
<alolita> vivek: there is no clear definition -
consonant+recursive-matras should not be allowed
<alolita> ... the unicode spec should be corrected to reflect
this
<alolita> consonant+matra+matra is allowed in unicode
akshat: when you say that multiple matras are allowed in
unicode - is this application specific ?
<alolita> ... open type also supports this unicode definition
akshat: that's an implementation issue rather than unicode
issue
vivek: please point to the part of the unicode standard that
describes this
<alolita> akshat: clear rules for syllable boundaries need to
be defined
akshat: whatever unicode says is in ch12 but doesn't specific
what should join and what not
<C-DAC_GIST>
[13]http://unicode.org/versions/Unicode8.0.0/ch12.pdf
[13] http://unicode.org/versions/Unicode8.0.0/ch12.pdf
<scribe> ACTION: Muthu (and Vivek) to verify the definitions in
ch12 for Tamil
<trackbot> Created ACTION-12 - (and vivek) to verify the
definitions in ch12 for tamil [on Muthu Nedumaran - due
2018-08-17].
muthu: I raised this because it is stated as ill-formed but i
don't think that is corret
akshat: upshot is lack of clarity of askshara definition
<scribe> ACTION: Alolita to convert Muthu's comments to github
issues
<trackbot> Created ACTION-13 - Convert muthu's comments to
github issues [on Alolita Sharma - due 2018-08-17].
2.12.1 Underline and Overline behaviour
Tamil and other south Indian scripts do not have a shirorekha
or line below as in Devanagari. The underline should match that
of Latin in a bilingual (or dual script) document, which is
more common in Malaysia and Singapore. However, it needs to
align with the underline of Devanagari when it combines with
Hindi or Sanskrit.
]]
muthu: malaysia and singapore use tamil and documents with
tamil and latin on same line, the underline should be at the
same place for both
... all tamil fonts include latin glyphs too, so the issue
doesn't arise so much
alolita: this issue would arise in india, esp in publishing
with mixed scripts
... so gap is that rules don't exist for what should happen for
position of underline and overline
r12a: recommend that we look at the CSS Text module and check
whether it addresses these issues
[14]https://drafts.csswg.org/css-text-decor-3/#line-decoration
[14] https://drafts.csswg.org/css-text-decor-3/#line-decoration
[15]http://w3c.github.io/typography/#text_decoration
[15] http://w3c.github.io/typography/#text_decoration
<scribe> ACTION: Alolita (all) to review CSS specification for
features
<trackbot> Created ACTION-14 - (all) to review css
specification for features [on Alolita Sharma - due
2018-08-17].
3.1 and 3.2 Line breaking and hyphenation
There are some simple rules for line breaking. Different people
use different implementations. However, I can’t find a decent
document for this online. Here’s a paper presented at a
conference held in Singapore:
[16]https://www.academia.edu/671796/Tamil_Hyphenator_P._David_P
rabhakar. The First 3 rules in the section Rules for Tamil
Hyphenation is a good start.
[16]
https://www.academia.edu/671796/Tamil_Hyphenator_P._David_Prabhakar.
]]
muthu: how do we frame the issue here ?
r12a: the gap would be that hyphenation is not happening for
users in browsers, then the next step would be to ask why
vivek: cdac has rules for many languages and this may be
available (though maybe not fully comprehensive) but could be a
useful resource
discussion about how to find the information
<scribe> ACTION: Akshat do a general search in CDAC for
original rule book for hypenation in 11 scrpts
<trackbot> Error creating an ACTION: could not connect to
Tracker. Please mail <sysreq@w3.org> with details about what
happened.
<scribe> ACTION: Akshat to do a general search in CDAC for
original rule book for hypenation in 11 scrpts
<trackbot> Created ACTION-16 - Do a general search in cdac for
original rule book for hypenation in 11 scrpts [on Akshat Joshi
- due 2018-08-17].
3.4 Counters, lists, etc
Need to understand : the other relies on the user-defined
mechanism specified in that spec in order to be applied.
Shouldn’t the default be ASCII numerals and, Tamil numerals be
user defined?
]]
3.5 Initial letter styling
Need to be mindful of conjuncts as defined in 2.8 above.
]]
3.7 Other paragraph features
Tamil can start a paragraph with or without indents. Paragraph
features are the same as English.
]]
<alolita_> yes
Meeting adjourned
Next meeting: two weeks
Summary of Action Items
[NEW] ACTION: Akshat do a general search in CDAC for original
rule book for hypenation in 11 scrpts
[NEW] ACTION: Alolita (all) to review CSS specification for
features
[NEW] ACTION: Alolita to convert Muthu's comments to github
issues
[NEW] ACTION: Muthu (and Vivek) to verify the definitions in
ch12 for Tamil
[NEW] ACTION: r12a to raise tamil segmentation issue in our
repo
Summary of Resolutions
[End of minutes]
Received on Friday, 10 August 2018 06:56:01 UTC