- From: Mark Davis ☕ <mark@macchiato.com>
- Date: Mon, 7 Feb 2011 09:45:49 -0800
- To: Koji Ishii <kojiishi@gluesoft.co.jp>
- Cc: "www-style@w3.org" <www-style@w3.org>, "'WWW International' (www-international@w3.org)" <www-international@w3.org>
- Message-ID: <AANLkTi=POa5CzynPj-_Hb16ZNF9E4TSqxzU0XVs72O3X@mail.gmail.com>
Please also file any feedback you have on breaking conditions (aka boundaries, segmentation) for particular languages at http://unicode.org/cldr/trac/newticket Please specify whether it is word-break, line-break, or other types of breaks. Mark *— Il meglio è l’inimico del bene —* On Mon, Feb 7, 2011 at 01:06, Koji Ishii <kojiishi@gluesoft.co.jp> wrote: > Removed rules using extended grapheme cluster as defined in UAX #29[1]. > > I'll ask Minegishi-san at ILCAA to review this, but any feedback is also > appreciated. > > DO NOT BREAK BEFORE: > * U+0E2F > * U+0E5A > > DO NOT BREAK BETWEEN: > * [U+0E31, U+0E3A] and <Consonants> > * U+0E3F THAI Currency Symbol BAHT and digits > * Digits ([U+0E50-0E59] and [U+0E50-0E59]) > > Covered by using extended grapheme cluster in UAX #29 > * <Consonants> and [U+0E30-0E3A] > * [U+0E40-0E44] and <Consonants> > * [U+0E24, U+0E26] and U+0E45 > * Any and U+0E46 (category=Lm) > * <Consonants> and [U+0E47] > * (<Consonants> or [U+0E34-0E39]) and [U+0E48-0E4B] > * (<Consonants> or [U+0E34-0E39]) and U+0E4C > * <Consonants> and [U+0E4D-0E4E] > > [1] http://unicode.org/reports/tr29/ > > Regards, > Koji > > -----Original Message----- > From: www-style-request@w3.org [mailto:www-style-request@w3.org] On Behalf > Of Koji Ishii > Sent: Monday, February 07, 2011 12:47 PM > To: www-style@w3.org > Subject: [css3-text] Thai line breaking rules > > I had a meeting with ILCAA, Research Institute for Languages and Cultures > of Asia and Africa[1] in Tokyo. Minegishi-san at ILCAA presented his idea > for the issue currently mentioned in the CSS3 Text spec[2]: > > > Additionally, some guidance should be provided on how to break or not > > break Southeast Asian in the absence of a dictionary. > > Here's his draft of the simple line breaking rules in the absence of a > dictionary for Thai scripts. Any corrections, and/or opinions whether to > include this in the spec or not would be appreciated. > > Thai character groups are based on TIS 620-2553 as written in Unicode > spec[3]. > Consonants: U+0E01-0E2E > > Line breaks are prohibited between: > * Any and U+0E2F > * <Consonants> and [U+0E30-0E3A] > * [U+0E31, U+0E3A, U+0E40-0E44] and <Consonants> > * U+0E3F THAI Currency Symbol BAHT and digits > * [U+0E24, U+0E26] and U+0E45 > * [U+0E50-0E59] and [U+0E50-0E59] > * Any and U+0E5A > > Following rules are also presented, but they are Unicode Lm or Mn category > and therefore I suspect that UAX#29 Unicode Text Segmentation should cover > these rules. > * Any and U+0E46 > * <Consonants> and [U+0E47] > * (<Consonants> or [U+0E34-0E39]) and [U+0E48-0E4B] > * (<Consonants> or [U+0E34-0E39]) and U+0E4C > * <Consonants> and [U+0E4D-0E4E] > > [1] http://www.aa.tufs.ac.jp/en > [2] http://dev.w3.org/csswg/css3-text/#line-breaking > [3] http://unicode.org/charts/PDF/U0E00.pdf > [4] http://unicode.org/reports/tr29/ > > Regards, > Koji > > >
Received on Monday, 7 February 2011 17:46:22 UTC