Re: Fw: [css3-text] Thai line breaking rules

Hello:

In response to proposed Thai line breaking rules from  Koji Ishii 
<kojiishi@gluesoft.co.jp>, I have added some feedback from Nattapong 
Sirilappanich (natta@th.ibm.com) of IBM Thailand.:

>>> (From Koji Ishii)
Here's his draft of the simple line breaking rules in the absence of a 
dictionary for Thai scripts. Any corrections, and/or opinions whether to 
include this in the spec or not would be appreciated.

Thai character groups are based on TIS 620-2553 as written in Unicode 
spec[3].
  Consonants: U+0E01-0E2E

Line breaks are prohibited between:
* Any and U+0E2F
* <Consonants> and [U+0E30-0E3A]
* [U+0E31, U+0E3A, U+0E40-0E44] and <Consonants>
* U+0E3F THAI Currency Symbol BAHT and digits
* [U+0E24, U+0E26] and U+0E45
* [U+0E50-0E59] and [U+0E50-0E59]
* Any and U+0E5A

Following rules are also presented, but they are Unicode Lm or Mn category 
and therefore I suspect that UAX#29 Unicode Text Segmentation should cover 
these rules.
* Any and U+0E46
* <Consonants> and [U+0E47]
* (<Consonants> or [U+0E34-0E39]) and [U+0E48-0E4B]
* (<Consonants> or [U+0E34-0E39]) and U+0E4C
* <Consonants> and [U+0E4D-0E4E]

[1] http://www.aa.tufs.ac.jp/en
[2] http://dev.w3.org/csswg/css3-text/#line-breaking
[3] http://unicode.org/charts/PDF/U0E00.pdf
[4] http://unicode.org/reports/tr29/
<<<<

>>> Feedback from Nattapong Sirilappanich

"I am agreed with all your rules and I have additional rules for you.
Let's me define additional non-terminal symbol.
Tone: 0E48-0E4B.
AD (Above Diacritic): 0E4C and 0E4E.

The additional rules are.
0E31 and <Tone>.
(0E34 or 0E38) and <AD>.
<Tone> and (0E30, 0E32 or 0E33).

Best regards, Uma


V.S. UMAmaheswaran, Ph.D.
Globalization Centre of Competency, IBM Toronto Lab
A3/SZ8, 8200 Warden Avenue, Markham, ON, Canada, L6G1C7; +1 905 413 3474; 
Fax: +1 905 413 4751; TieLine 313-3474; email: umavs@ca.ibm.com

Received on Friday, 11 March 2011 16:00:36 UTC