Re: [css3-text] Thai line breaking rules

Please also file any feedback you have on breaking conditions (aka
boundaries, segmentation) for particular languages at

http://unicode.org/cldr/trac/newticket

Please specify whether it is word-break, line-break, or other types of
breaks.

Mark

*— Il meglio è l’inimico del bene —*


On Mon, Feb 7, 2011 at 01:06, Koji Ishii <kojiishi@gluesoft.co.jp> wrote:

> Removed rules using extended grapheme cluster as defined in UAX #29[1].
>
> I'll ask Minegishi-san at ILCAA to review this, but any feedback is also
> appreciated.
>
> DO NOT BREAK BEFORE:
> * U+0E2F
> * U+0E5A
>
> DO NOT BREAK BETWEEN:
> * [U+0E31, U+0E3A] and <Consonants>
> * U+0E3F THAI Currency Symbol BAHT and digits
> * Digits ([U+0E50-0E59] and [U+0E50-0E59])
>
> Covered by using extended grapheme cluster in UAX #29
> * <Consonants> and [U+0E30-0E3A]
> * [U+0E40-0E44] and <Consonants>
> * [U+0E24, U+0E26] and U+0E45
> * Any and U+0E46 (category=Lm)
> * <Consonants> and [U+0E47]
> * (<Consonants> or [U+0E34-0E39]) and [U+0E48-0E4B]
> * (<Consonants> or [U+0E34-0E39]) and U+0E4C
> * <Consonants> and [U+0E4D-0E4E]
>
> [1] http://unicode.org/reports/tr29/
>
> Regards,
> Koji
>
> -----Original Message-----
> From: www-style-request@w3.org [mailto:www-style-request@w3.org] On Behalf
> Of Koji Ishii
> Sent: Monday, February 07, 2011 12:47 PM
> To: www-style@w3.org
> Subject: [css3-text] Thai line breaking rules
>
> I had a meeting with ILCAA, Research Institute for Languages and Cultures
> of Asia and Africa[1] in Tokyo. Minegishi-san at ILCAA presented his idea
> for the issue currently mentioned in the CSS3 Text spec[2]:
>
> > Additionally, some guidance should be provided on how to break or not
> > break Southeast Asian in the absence of a dictionary.
>
> Here's his draft of the simple line breaking rules in the absence of a
> dictionary for Thai scripts. Any corrections, and/or opinions whether to
> include this in the spec or not would be appreciated.
>
> Thai character groups are based on TIS 620-2553 as written in Unicode
> spec[3].
>  Consonants: U+0E01-0E2E
>
> Line breaks are prohibited between:
> * Any and U+0E2F
> * <Consonants> and [U+0E30-0E3A]
> * [U+0E31, U+0E3A, U+0E40-0E44] and <Consonants>
> * U+0E3F THAI Currency Symbol BAHT and digits
> * [U+0E24, U+0E26] and U+0E45
> * [U+0E50-0E59] and [U+0E50-0E59]
> * Any and U+0E5A
>
> Following rules are also presented, but they are Unicode Lm or Mn category
> and therefore I suspect that UAX#29 Unicode Text Segmentation should cover
> these rules.
> * Any and U+0E46
> * <Consonants> and [U+0E47]
> * (<Consonants> or [U+0E34-0E39]) and [U+0E48-0E4B]
> * (<Consonants> or [U+0E34-0E39]) and U+0E4C
> * <Consonants> and [U+0E4D-0E4E]
>
> [1] http://www.aa.tufs.ac.jp/en
> [2] http://dev.w3.org/csswg/css3-text/#line-breaking
> [3] http://unicode.org/charts/PDF/U0E00.pdf
> [4] http://unicode.org/reports/tr29/
>
> Regards,
> Koji
>
>
>

Received on Monday, 7 February 2011 17:47:23 UTC