# [css3-lists] cjk numbering

From: MURAKAMI Shinyu <murakami@antenna.co.jp>
Date: Sun, 08 Feb 2009 18:23:12 +0900

Message-Id: <20090208180329.3C96.C598BCD7@antenna.co.jp>
```
Hi folks,

The description in the CSS3 Lists draft spec about cjk numbering systems
differs from my knowledge and the description in Wikipedia's articles.

CSS Lists Module Level 3 (Editor's Draft)
http://dev.w3.org/csswg/css3-lists/

Wikipedia: Japanese numerals
http://en.wikipedia.org/wiki/Japanese_numerals

Wikipedia: Chinese numerals
http://en.wikipedia.org/wiki/Chinese_numerals

Problem 1: The cjk-ideographic algorithm needs to be adjusted for Japanese numerals

| cjk-ideographic
|
| The cjk-ideographic algorithm is used by several numbering systems,
| using different sets of digits. These systems are defined for
| numbers greater than or equal to 0 and less than 10^16. Numbers
| less than zero or equal to or greater than 10^16 should use the
| decimal system. The core algorithm is as follows:
| (1) Split the decimal number into groups of four digits, starting
| with the least significant digit.
| (2) Ignoring groups that have the value zero, append the second
| group marker to the second group, the third group marker to the
| third group, and the fourth group marker to the fourth group. These
| markers are defined in the tables for the specific numbering
| systems. The first group has no marker.
| (3) For each group, ignoring digits that have the value zero,
| append the second digit marker to the second digit, the third digit
| marker to the third digit, and the fourth digit marker to the
| fourth digit. These markers are defined in the tables for the
| specific numbering systems. The first digit has no marker.
| (4) For any group with a value less than 20, remove the second
| digit (the 1 in the tens column). Leave any associated markers.
| (5) Concatenate the groups back into a single string, least
| significant group last.
| (6) Collapse any consecutive runs of 0 digits to a single 0.
| (7) Replace each digit with the relevant character selected from
| the numbering system's table.

Rules (4) and (6) are only for Chinese numerals and not applicable to
Japanese numerals. Rules for Japanese numerals (4-JA) and (6-JA) should
be the following:

(4-JA) Remove the 1 (一) where a digit marker (十/百/千) is appended.
e.g.,
11 is 十一 in both Chinese and Japanese
1111111 is 一百一十一万一千一百一十一 in Chinese but 百十一万千百十一 in Japanese.

(6-JA) Remove any 0 (零 or 〇) unless the whole value is 0.
e.g.,
205 is 二百零五 or 二百〇五 in Chinese, 二百五 in Japanese.
100004 is 十万零四 or 十万〇四 in Chinese, 十万四 in Japanese.
0 is 零 or 〇 in both Chinese and Japanese

Problem 2: CJK numbering system's tables errata

| japanese-formal
| This uses the cjk-ideographic system with the following table.
| Formal Japanese numbering system Values  Codepoints
| Second Group Marker  万  U+4E07
| Third Group Marker   億  U+5104
| Fourth Group Marker  兆  U+5146
| Second Digit Marker  拾  U+62FE
| Third Digit Marker   佰  U+4F70
| Fourth Digit Marker  仟  U+4EDF
| Digit 0  零  U+96F6
| Digit 1  壹  U+58F9
| Digit 2  貳  U+8CB3
| Digit 3  參  U+53C3
| Digit 4  肆  U+8086
| Digit 5  伍  U+4F0D
| Digit 6  陸  U+9678
| Digit 7  柒  U+67D2
| Digit 8  捌  U+634C
| Digit 9  玖  U+7396

Japanese formal numbers have in-use and obsolete forms. See:
http://en.wikipedia.org/wiki/Japanese_numerals#Formal_numbers

I think 'japanese-formal' should use in-use characters as the following:

japanese-formal
Second Group Marker  万  U+4E07  (obsolete: 萬 U+842C)
Third Group Marker   億  U+5104
Fourth Group Marker  兆  U+5146
Second Digit Marker  拾  U+62FE
Third Digit Marker   百  U+767E  (obsolete: 佰 U+4F70)
Fourth Digit Marker  千  U+5343  (obsolete: 仟 U+4EDF)
Digit 0  零  U+96F6  (〇 U+3007 is also used)
Digit 1  壱  U+58F1  (obsolete: 壹 U+58F9)
Digit 2  弐  U+5F10  (obsolete: 貳 U+8CB3)
Digit 3  参  U+53C2  (obsolete: 參 U+53C3)
Digit 4  四  U+56DB  (obsolete: 肆 U+8086)
Digit 5  五  U+4E94  (obsolete: 伍 U+4F0D)
Digit 6  六  U+516D  (obsolete: 陸 U+9678)
Digit 7  七  U+4E03  (obsolete: 柒 U+67D2)
Digit 8  八  U+516B  (obsolete: 捌 U+634C)
Digit 9  九  U+4E5D  (obsolete: 玖 U+7396)

The obsolete form may be defined as 'japanese-formal-obsolete' if needed.

| japanese-informal
| This uses the cjk-ideographic system with the following table.
| Informal Japanese numbering system Values  Codepoints
| Second Group Marker  萬  U+842C
| Third Group Marker   億  U+5104
| Fourth Group Marker  兆  U+5146
| Second Digit Marker  萬  U+842C
| Third Digit Marker   億  U+5104
| Fourth Digit Marker  兆  U+5146
| Digit 0  零  U+96F6
| Digit 1  壹  U+58F9
| Digit 2  贰  U+8D30
| Digit 3  叁  U+53C1
| Digit 4  肆  U+8086
| Digit 5  伍  U+4F0D
| Digit 6  陆  U+9646
| Digit 7  柒  U+67D2
| Digit 8  捌  U+634C
| Digit 9  玖  U+7396

The 'japanese-informal' (normal Japanese numerals) table seems completely erroneous.
The following is the corrected version:

japanese-informal
Second Group Marker  万  U+4E07
Third Group Marker   億  U+5104
Fourth Group Marker  兆  U+5146
Second Digit Marker  十  U+5341
Third Digit Marker   百  U+767E
Fourth Digit Marker  千  U+5343
Digit 0  零  U+96F6  (〇 U+3007 is also used)
Digit 1  一  U+4E00
Digit 2  二  U+4E8C
Digit 3  三  U+4E09
Digit 4  四  U+56DB
Digit 5  五  U+4E94
Digit 6  六  U+516D
Digit 7  七  U+4E03
Digit 8  八  U+516B
Digit 9  九  U+4E5D

| simp-chinese-formal
| This uses the cjk-ideographic system with the following table.
| Formal simple Chinese numbering system Values  Codepoints
| Second Group Marker  万  U+4E07
| Third Group Marker   億  U+5104
| Fourth Group Marker  兆  U+5146
| Second Digit Marker  万  U+4E07
| Third Digit Marker   亿  U+4EBF
| Fourth Digit Marker  兆  U+5146
| Digit 0  零  U+96F6
| Digit 1  壹  U+58F9
| Digit 2  貳  U+8CB3
| Digit 3  參  U+53C3
| Digit 4  肆  U+8086
| Digit 5  伍  U+4F0D
| Digit 6  陸  U+9678
| Digit 7  柒  U+67D2
| Digit 8  捌  U+634C
| Digit 9  玖  U+7396

The following is the corrected version (simp-chinese-formal):

simp-chinese-formal
Second Group Marker  萬  U+842C
Third Group Marker   億  U+5104
Fourth Group Marker  兆  U+5146
Second Digit Marker  拾  U+62FE
Third Digit Marker   佰  U+4F70
Fourth Digit Marker  仟  U+4EDF
Digit 0  零  U+96F6
Digit 1  壹  U+58F9
Digit 2  贰  U+8D30
Digit 3  叁  U+53C1
Digit 4  肆  U+8086
Digit 5  伍  U+4F0D
Digit 6  陆  U+9646
Digit 7  柒  U+67D2
Digit 8  捌  U+634C
Digit 9  玖  U+7396

| simp-chinese-informal
| This uses the cjk-ideographic system with the following table.
| Informal simple Chinese numbering system Values  Codepoints
| Second Group Marker  萬  U+842C
| Third Group Marker   億  U+5104
| Fourth Group Marker  兆  U+5146
| Second Digit Marker  萬  U+842C
| Third Digit Marker   億  U+5104
| Fourth Digit Marker  兆  U+5146
| Digit 0  零  U+96F6
| Digit 1  壹  U+58F9
| Digit 2  貳  U+8CB3
| Digit 3  參  U+53C3
| Digit 4  肆  U+8086
| Digit 5  伍  U+4F0D
| Digit 6  陸  U+9678
| Digit 7  柒  U+67D2
| Digit 8  捌  U+634C
| Digit 9  玖  U+7396

The following is the corrected version (simp-chinese-informal):

simp-chinese-informal
Second Group Marker  万  U+4E07
Third Group Marker   亿  U+4EBF
Fourth Group Marker  兆  U+5146
Second Digit Marker  十  U+5341
Third Digit Marker   百  U+767E
Fourth Digit Marker  千  U+5343
Digit 0  零  U+96F6  (〇 U+3007 is also used)
Digit 1  一  U+4E00
Digit 2  二  U+4E8C
Digit 3  三  U+4E09
Digit 4  四  U+56DB
Digit 5  五  U+4E94
Digit 6  六  U+516D
Digit 7  七  U+4E03
Digit 8  八  U+516B
Digit 9  九  U+4E5D

| This uses the cjk-ideographic system with the following table.
| Formal Traditional Chinese numbering system Values  Codepoints
| Second Group Marker  万  U+4E07
| Third Group Marker   亿  U+4EBF
| Fourth Group Marker  兆  U+5146
| Second Digit Marker  万  U+4E07
| Third Digit Marker   亿  U+4EBF
| Fourth Digit Marker  兆  U+5146
| Digit 0  零  U+96F6
| Digit 1  一  U+4E00
| Digit 2  亼  U+4EBC
| Digit 3  三  U+4E09
| Digit 4  四  U+56DB
| Digit 5  五  U+4E94
| Digit 6  六  U+516D
| Digit 7  七  U+4E03
| Digit 8  八  U+516B
| Digit 9  九  U+4E5D

The following is the corrected version:

Second Group Marker  萬  U+842C
Third Group Marker   億  U+5104
Fourth Group Marker  兆  U+5146
Second Digit Marker  拾  U+62FE
Third Digit Marker   佰  U+4F70
Fourth Digit Marker  仟  U+4EDF
Digit 0  零  U+96F6
Digit 1  壹  U+58F9
Digit 2  貳  U+8CB3
Digit 3  參  U+53C3
Digit 4  肆  U+8086
Digit 5  伍  U+4F0D
Digit 6  陸  U+9678
Digit 7  柒  U+67D2
Digit 8  捌  U+634C
Digit 9  玖  U+7396

| This uses the cjk-ideographic system with the following table.
| Informal traditional Chinese numbering system Values  Codepoints
| Second Group Marker  萬  U+842C
| Third Group Marker   億  U+5104
| Fourth Group Marker  兆  U+5146
| Second Digit Marker  萬  U+842C
| Third Digit Marker   億  U+5104
| Fourth Digit Marker  兆  U+5146
| Digit 0  零  U+96F6
| Digit 1  一  U+4E00
| Digit 2  亼  U+4EBC
| Digit 3  三  U+4E09
| Digit 4  四  U+56DB
| Digit 5  五  U+4E94
| Digit 6  六  U+516D
| Digit 7  七  U+4E03
| Digit 8  八  U+516B

The following is the corrected version:

Second Group Marker  萬  U+842C
Third Group Marker   億  U+5104
Fourth Group Marker  兆  U+5146
Second Digit Marker  十  U+5341
Third Digit Marker   百  U+767E
Fourth Digit Marker  千  U+5343
Digit 0  零  U+96F6  (〇 U+3007 is also used)
Digit 1  一  U+4E00
Digit 2  二  U+4E8C
Digit 3  三  U+4E09
Digit 4  四  U+56DB
Digit 5  五  U+4E94
Digit 6  六  U+516D
Digit 7  七  U+4E03
Digit 8  八  U+516B
Digit 9  九  U+4E5D

Problem 3: 'cjk-decimal' is needed

In this draft spec, the most useful CJK numbering system is missing.
I would like to call this 'cjk-decimal'. This should be added to <numeric>.

cjk-decimal
〇一二三四五六七八九
U+3007, U+4E00, U+4E8C, U+4E09, U+56DB,
U+4E94, U+516D, U+4E03, U+516B, U+4E5D

e.g.,
2009 is 二〇〇九.
0 is 〇 (not 零).

Problem 4: Fullwidth forms are needed

The following fullwidth forms may be useful, especially in vertical text.

fullwidth-decimal
０１２３４５６７８９ (U+FF10..U+FF19)
fullwidth-lower-latin
ａｂｃｄｅｆｇｈｉｊｋｌｍｎｏｐｑｒｓｔｕｖｗｘｙｚ (U+FF41..U+FF5A)
fullwidth-upper-latin
ＡＢＣＤＥＦＧＨＩＪＫＬＭＮＯＰＱＲＳＴＵＶＷＸＹＺ (U+FF21..U+FF3A)

Problem 5: Suffix of cjk numbering systems

The suffix '.' U+002E for cjk-ideographic numbering systems or cjk
alphabetic systems may not be appropriate.
Especially in vertical text, the suffix '.' is very odd.

I think the suffix should be changed to none for the following cjk
numbering systems:

cjk-ideographic
japanese-informal
japanese-formal
japanese-formal-obsolete (I proposed in Problem 2)
simp-chinese-informal
simp-chinese-formal
hiragana
hiragana-iroha
katakana
katakana-iroha
cjk-earthly-branch
cjk-heavenly-stem
cjk-decimal (I proposed in Problem 3)
fullwidth-decimal (I proposed in Problem 4)
fullwidth-lower-latin (I proposed in Problem 4)
fullwidth-upper-latin (I proposed in Problem 4)

Examples:

I give some real cjk numbering example found in the web:

Example 1. (from a Japanese government ordinance)
http://law.e-gov.go.jp/htmldata/H20/H20SE389.html

In this example, the 'japanese-informal' and the 'katakana-iroha' is
used (without suffix):

...

...
イ　国家行政組織法第二十一条第四項 前段に規定する総括整理する職又は同条第五項 前段に規定する総括整理する職
ロ　内閣府設置法第十七条第八項 に規定する総括整理する職
ハ　宮内庁法 （昭和二十二年法律第七十号）第十五条第四項 に規定する総括整理する職
...

Example 2. (from a Japanese official notice)
http://www.env.go.jp/hourei/syousai.php?id=14000081

In this example, 'cjk-decimal' (I proposed in Problem 3) and 'katakana'
is used (without suffix, but parenthesized cjk-decimal is also used):

...

...

...

...

(一)　医療手帳の交付を受けようとする者は、関係県知事にその交付を申請しなければならない。
(二)　前号の申請には、次の書類を添付しなければならない。
ア　通常のレベルを超えるメチル水銀の曝露を受けた可能性があることを証する次のいずれかの資料
...
イ　特定症候についての関係県知事が指定する医療機関の医師の診断書。ただし、水俣病に係る認定の申請に対する審査に供された検診資料があるときは、提出することを要しない。
ウ　特定症候についての関係県知事が定める要件に該当する専門医の所定の記載事項を満たす診断書。
...

Best regards,

--
Shinyu Murakami
http://www.antennahouse.com
Antenna House Formatter
http://www.antenna.co.jp/AHF/en/
```
Received on Sunday, 8 February 2009 09:23:59 UTC

This archive was generated by hypermail 2.3.1 : Monday, 2 May 2016 14:38:24 UTC