W3C home > Mailing lists > Public > www-style@w3.org > February 2009

[css3-lists] cjk numbering

From: MURAKAMI Shinyu <murakami@antenna.co.jp>
Date: Sun, 08 Feb 2009 18:23:12 +0900
To: www-style@w3.org
Message-Id: <20090208180329.3C96.C598BCD7@antenna.co.jp>

Hi folks,

The description in the CSS3 Lists draft spec about cjk numbering systems
differs from my knowledge and the description in Wikipedia's articles.

CSS Lists Module Level 3 (Editor's Draft)
http://dev.w3.org/csswg/css3-lists/

Wikipedia: Japanese numerals
http://en.wikipedia.org/wiki/Japanese_numerals

Wikipedia: Chinese numerals
http://en.wikipedia.org/wiki/Chinese_numerals


Problem 1: The cjk-ideographic algorithm needs to be adjusted for Japanese numerals

| cjk-ideographic 
|
| The cjk-ideographic algorithm is used by several numbering systems, 
| using different sets of digits. These systems are defined for 
| numbers greater than or equal to 0 and less than 10^16. Numbers 
| less than zero or equal to or greater than 10^16 should use the 
| decimal system. The core algorithm is as follows: 
| (1) Split the decimal number into groups of four digits, starting 
| with the least significant digit. 
| (2) Ignoring groups that have the value zero, append the second 
| group marker to the second group, the third group marker to the 
| third group, and the fourth group marker to the fourth group. These 
| markers are defined in the tables for the specific numbering 
| systems. The first group has no marker. 
| (3) For each group, ignoring digits that have the value zero, 
| append the second digit marker to the second digit, the third digit 
| marker to the third digit, and the fourth digit marker to the 
| fourth digit. These markers are defined in the tables for the 
| specific numbering systems. The first digit has no marker. 
| (4) For any group with a value less than 20, remove the second 
| digit (the 1 in the tens column). Leave any associated markers. 
| (5) Concatenate the groups back into a single string, least 
| significant group last. 
| (6) Collapse any consecutive runs of 0 digits to a single 0. 
| (7) Replace each digit with the relevant character selected from 
| the numbering system's table. 

Rules (4) and (6) are only for Chinese numerals and not applicable to
Japanese numerals. Rules for Japanese numerals (4-JA) and (6-JA) should
be the following:

(4-JA) Remove the 1 (一) where a digit marker (十/百/千) is appended.
e.g., 
11 is 十一 in both Chinese and Japanese
1111111 is 一百一十一万一千一百一十一 in Chinese but 百十一万千百十一 in Japanese.

(6-JA) Remove any 0 (零 or 〇) unless the whole value is 0.
e.g.,
205 is 二百零五 or 二百〇五 in Chinese, 二百五 in Japanese.
100004 is 十万零四 or 十万〇四 in Chinese, 十万四 in Japanese.
0 is 零 or 〇 in both Chinese and Japanese


Problem 2: CJK numbering system's tables errata

| japanese-formal 
| This uses the cjk-ideographic system with the following table. 
| Formal Japanese numbering system Values  Codepoints
| Second Group Marker  万  U+4E07
| Third Group Marker   億  U+5104
| Fourth Group Marker  兆  U+5146
| Second Digit Marker  拾  U+62FE
| Third Digit Marker   佰  U+4F70
| Fourth Digit Marker  仟  U+4EDF
| Digit 0  零  U+96F6
| Digit 1  壹  U+58F9
| Digit 2  貳  U+8CB3
| Digit 3  參  U+53C3
| Digit 4  肆  U+8086
| Digit 5  伍  U+4F0D
| Digit 6  陸  U+9678
| Digit 7  柒  U+67D2
| Digit 8  捌  U+634C
| Digit 9  玖  U+7396

Japanese formal numbers have in-use and obsolete forms. See:
http://en.wikipedia.org/wiki/Japanese_numerals#Formal_numbers

I think 'japanese-formal' should use in-use characters as the following:

japanese-formal
  Second Group Marker  万  U+4E07  (obsolete: 萬 U+842C)
  Third Group Marker   億  U+5104
  Fourth Group Marker  兆  U+5146
  Second Digit Marker  拾  U+62FE
  Third Digit Marker   百  U+767E  (obsolete: 佰 U+4F70)
  Fourth Digit Marker  千  U+5343  (obsolete: 仟 U+4EDF)
  Digit 0  零  U+96F6  (〇 U+3007 is also used)
  Digit 1  壱  U+58F1  (obsolete: 壹 U+58F9)
  Digit 2  弐  U+5F10  (obsolete: 貳 U+8CB3)
  Digit 3  参  U+53C2  (obsolete: 參 U+53C3)
  Digit 4  四  U+56DB  (obsolete: 肆 U+8086)
  Digit 5  五  U+4E94  (obsolete: 伍 U+4F0D)
  Digit 6  六  U+516D  (obsolete: 陸 U+9678)
  Digit 7  七  U+4E03  (obsolete: 柒 U+67D2)
  Digit 8  八  U+516B  (obsolete: 捌 U+634C)
  Digit 9  九  U+4E5D  (obsolete: 玖 U+7396)

The obsolete form may be defined as 'japanese-formal-obsolete' if needed.

| japanese-informal 
| This uses the cjk-ideographic system with the following table. 
| Informal Japanese numbering system Values  Codepoints
| Second Group Marker  萬  U+842C
| Third Group Marker   億  U+5104
| Fourth Group Marker  兆  U+5146
| Second Digit Marker  萬  U+842C
| Third Digit Marker   億  U+5104
| Fourth Digit Marker  兆  U+5146
| Digit 0  零  U+96F6
| Digit 1  壹  U+58F9
| Digit 2  贰  U+8D30
| Digit 3  叁  U+53C1
| Digit 4  肆  U+8086
| Digit 5  伍  U+4F0D
| Digit 6  陆  U+9646
| Digit 7  柒  U+67D2
| Digit 8  捌  U+634C
| Digit 9  玖  U+7396

The 'japanese-informal' (normal Japanese numerals) table seems completely erroneous.
The following is the corrected version:

japanese-informal
  Second Group Marker  万  U+4E07
  Third Group Marker   億  U+5104
  Fourth Group Marker  兆  U+5146
  Second Digit Marker  十  U+5341
  Third Digit Marker   百  U+767E
  Fourth Digit Marker  千  U+5343
  Digit 0  零  U+96F6  (〇 U+3007 is also used)
  Digit 1  一  U+4E00
  Digit 2  二  U+4E8C
  Digit 3  三  U+4E09
  Digit 4  四  U+56DB
  Digit 5  五  U+4E94
  Digit 6  六  U+516D
  Digit 7  七  U+4E03
  Digit 8  八  U+516B
  Digit 9  九  U+4E5D

| simp-chinese-formal 
| This uses the cjk-ideographic system with the following table. 
| Formal simple Chinese numbering system Values  Codepoints
| Second Group Marker  万  U+4E07
| Third Group Marker   億  U+5104
| Fourth Group Marker  兆  U+5146
| Second Digit Marker  万  U+4E07
| Third Digit Marker   亿  U+4EBF
| Fourth Digit Marker  兆  U+5146
| Digit 0  零  U+96F6
| Digit 1  壹  U+58F9
| Digit 2  貳  U+8CB3
| Digit 3  參  U+53C3
| Digit 4  肆  U+8086
| Digit 5  伍  U+4F0D
| Digit 6  陸  U+9678
| Digit 7  柒  U+67D2
| Digit 8  捌  U+634C
| Digit 9  玖  U+7396

The following is the corrected version (simp-chinese-formal):

simp-chinese-formal
  Second Group Marker  萬  U+842C
  Third Group Marker   億  U+5104
  Fourth Group Marker  兆  U+5146
  Second Digit Marker  拾  U+62FE
  Third Digit Marker   佰  U+4F70
  Fourth Digit Marker  仟  U+4EDF
  Digit 0  零  U+96F6
  Digit 1  壹  U+58F9
  Digit 2  贰  U+8D30
  Digit 3  叁  U+53C1
  Digit 4  肆  U+8086
  Digit 5  伍  U+4F0D
  Digit 6  陆  U+9646
  Digit 7  柒  U+67D2
  Digit 8  捌  U+634C
  Digit 9  玖  U+7396


| simp-chinese-informal 
| This uses the cjk-ideographic system with the following table. 
| Informal simple Chinese numbering system Values  Codepoints
| Second Group Marker  萬  U+842C
| Third Group Marker   億  U+5104
| Fourth Group Marker  兆  U+5146
| Second Digit Marker  萬  U+842C
| Third Digit Marker   億  U+5104
| Fourth Digit Marker  兆  U+5146
| Digit 0  零  U+96F6
| Digit 1  壹  U+58F9
| Digit 2  貳  U+8CB3
| Digit 3  參  U+53C3
| Digit 4  肆  U+8086
| Digit 5  伍  U+4F0D
| Digit 6  陸  U+9678
| Digit 7  柒  U+67D2
| Digit 8  捌  U+634C
| Digit 9  玖  U+7396

The following is the corrected version (simp-chinese-informal):

simp-chinese-informal
  Second Group Marker  万  U+4E07
  Third Group Marker   亿  U+4EBF
  Fourth Group Marker  兆  U+5146
  Second Digit Marker  十  U+5341
  Third Digit Marker   百  U+767E
  Fourth Digit Marker  千  U+5343
  Digit 0  零  U+96F6  (〇 U+3007 is also used)
  Digit 1  一  U+4E00
  Digit 2  二  U+4E8C
  Digit 3  三  U+4E09
  Digit 4  四  U+56DB
  Digit 5  五  U+4E94
  Digit 6  六  U+516D
  Digit 7  七  U+4E03
  Digit 8  八  U+516B
  Digit 9  九  U+4E5D


| trad-chinese-formal 
| This uses the cjk-ideographic system with the following table. 
| Formal Traditional Chinese numbering system Values  Codepoints
| Second Group Marker  万  U+4E07
| Third Group Marker   亿  U+4EBF
| Fourth Group Marker  兆  U+5146
| Second Digit Marker  万  U+4E07
| Third Digit Marker   亿  U+4EBF
| Fourth Digit Marker  兆  U+5146
| Digit 0  零  U+96F6
| Digit 1  一  U+4E00
| Digit 2  亼  U+4EBC
| Digit 3  三  U+4E09
| Digit 4  四  U+56DB
| Digit 5  五  U+4E94
| Digit 6  六  U+516D
| Digit 7  七  U+4E03
| Digit 8  八  U+516B
| Digit 9  九  U+4E5D

The following is the corrected version:

trad-chinese-formal
  Second Group Marker  萬  U+842C
  Third Group Marker   億  U+5104
  Fourth Group Marker  兆  U+5146
  Second Digit Marker  拾  U+62FE
  Third Digit Marker   佰  U+4F70
  Fourth Digit Marker  仟  U+4EDF
  Digit 0  零  U+96F6
  Digit 1  壹  U+58F9
  Digit 2  貳  U+8CB3
  Digit 3  參  U+53C3
  Digit 4  肆  U+8086
  Digit 5  伍  U+4F0D
  Digit 6  陸  U+9678
  Digit 7  柒  U+67D2
  Digit 8  捌  U+634C
  Digit 9  玖  U+7396


| trad-chinese-informal 
| This uses the cjk-ideographic system with the following table. 
| Informal traditional Chinese numbering system Values  Codepoints
| Second Group Marker  萬  U+842C
| Third Group Marker   億  U+5104
| Fourth Group Marker  兆  U+5146
| Second Digit Marker  萬  U+842C
| Third Digit Marker   億  U+5104
| Fourth Digit Marker  兆  U+5146
| Digit 0  零  U+96F6
| Digit 1  一  U+4E00
| Digit 2  亼  U+4EBC
| Digit 3  三  U+4E09
| Digit 4  四  U+56DB
| Digit 5  五  U+4E94
| Digit 6  六  U+516D
| Digit 7  七  U+4E03
| Digit 8  八  U+516B

The following is the corrected version:

trad-chinese-informal
  Second Group Marker  萬  U+842C
  Third Group Marker   億  U+5104
  Fourth Group Marker  兆  U+5146
  Second Digit Marker  十  U+5341
  Third Digit Marker   百  U+767E
  Fourth Digit Marker  千  U+5343
  Digit 0  零  U+96F6  (〇 U+3007 is also used)
  Digit 1  一  U+4E00
  Digit 2  二  U+4E8C
  Digit 3  三  U+4E09
  Digit 4  四  U+56DB
  Digit 5  五  U+4E94
  Digit 6  六  U+516D
  Digit 7  七  U+4E03
  Digit 8  八  U+516B
  Digit 9  九  U+4E5D


Problem 3: 'cjk-decimal' is needed

In this draft spec, the most useful CJK numbering system is missing.
I would like to call this 'cjk-decimal'. This should be added to <numeric>.

cjk-decimal
〇一二三四五六七八九
U+3007, U+4E00, U+4E8C, U+4E09, U+56DB,
U+4E94, U+516D, U+4E03, U+516B, U+4E5D

e.g.,
2009 is 二〇〇九.
0 is 〇 (not 零).


Problem 4: Fullwidth forms are needed

The following fullwidth forms may be useful, especially in vertical text.

fullwidth-decimal
  0123456789 (U+FF10..U+FF19)
fullwidth-lower-latin
  abcdefghijklmnopqrstuvwxyz (U+FF41..U+FF5A)
fullwidth-upper-latin
  ABCDEFGHIJKLMNOPQRSTUVWXYZ (U+FF21..U+FF3A)


Problem 5: Suffix of cjk numbering systems

The suffix '.' U+002E for cjk-ideographic numbering systems or cjk
alphabetic systems may not be appropriate.
Especially in vertical text, the suffix '.' is very odd.

I think the suffix should be changed to none for the following cjk
numbering systems:

    cjk-ideographic
    japanese-informal
    japanese-formal
    japanese-formal-obsolete (I proposed in Problem 2)
    simp-chinese-informal
    simp-chinese-formal
    trad-chinese-informal
    trad-chinese-formal
    hiragana
    hiragana-iroha
    katakana
    katakana-iroha
    cjk-earthly-branch
    cjk-heavenly-stem
    cjk-decimal (I proposed in Problem 3)
    fullwidth-decimal (I proposed in Problem 4)
    fullwidth-lower-latin (I proposed in Problem 4)
    fullwidth-upper-latin (I proposed in Problem 4)


Examples:

I give some real cjk numbering example found in the web:

Example 1. (from a Japanese government ordinance)
http://law.e-gov.go.jp/htmldata/H20/H20SE389.html

In this example, the 'japanese-informal' and the 'katakana-iroha' is
used (without suffix):

一  沖縄振興開発金融公庫 
二  首都高速道路株式会社 
三  株式会社日本政策金融公庫 
...
十  日本私立学校振興・共済事業団 
十一  軽自動車検査協会 
十二  日本下水道事業団 
...
イ 国家行政組織法第二十一条第四項 前段に規定する総括整理する職又は同条第五項 前段に規定する総括整理する職
ロ 内閣府設置法第十七条第八項 に規定する総括整理する職
ハ 宮内庁法 (昭和二十二年法律第七十号)第十五条第四項 に規定する総括整理する職
...

Example 2. (from a Japanese official notice)
http://www.env.go.jp/hourei/syousai.php?id=14000081

In this example, 'cjk-decimal' (I proposed in Problem 3) and 'katakana'
is used (without suffix, but parenthesized cjk-decimal is also used):

一 目的
    ...
二 実施主体
...
一〇 対象地域
    ...
一一 医療手帳の交付の対象
    ...
一二 療養手帳の交付
 (一) 医療手帳の交付を受けようとする者は、関係県知事にその交付を申請しなければならない。
 (二) 前号の申請には、次の書類を添付しなければならない。
  ア 通常のレベルを超えるメチル水銀の曝露を受けた可能性があることを証する次のいずれかの資料
        ...
  イ 特定症候についての関係県知事が指定する医療機関の医師の診断書。ただし、水俣病に係る認定の申請に対する審査に供された検診資料があるときは、提出することを要しない。
  ウ 特定症候についての関係県知事が定める要件に該当する専門医の所定の記載事項を満たす診断書。
...

Best regards,

-- 
Shinyu Murakami
http://www.antennahouse.com
Antenna House Formatter
http://www.antenna.co.jp/AHF/en/
Received on Sunday, 8 February 2009 09:23:59 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 17:20:16 GMT