- From: MURAKAMI Shinyu <murakami@antenna.co.jp>
- Date: Sun, 08 Feb 2009 18:23:12 +0900
- To: www-style@w3.org
Hi folks, The description in the CSS3 Lists draft spec about cjk numbering systems differs from my knowledge and the description in Wikipedia's articles. CSS Lists Module Level 3 (Editor's Draft) http://dev.w3.org/csswg/css3-lists/ Wikipedia: Japanese numerals http://en.wikipedia.org/wiki/Japanese_numerals Wikipedia: Chinese numerals http://en.wikipedia.org/wiki/Chinese_numerals Problem 1: The cjk-ideographic algorithm needs to be adjusted for Japanese numerals | cjk-ideographic | | The cjk-ideographic algorithm is used by several numbering systems, | using different sets of digits. These systems are defined for | numbers greater than or equal to 0 and less than 10^16. Numbers | less than zero or equal to or greater than 10^16 should use the | decimal system. The core algorithm is as follows: | (1) Split the decimal number into groups of four digits, starting | with the least significant digit. | (2) Ignoring groups that have the value zero, append the second | group marker to the second group, the third group marker to the | third group, and the fourth group marker to the fourth group. These | markers are defined in the tables for the specific numbering | systems. The first group has no marker. | (3) For each group, ignoring digits that have the value zero, | append the second digit marker to the second digit, the third digit | marker to the third digit, and the fourth digit marker to the | fourth digit. These markers are defined in the tables for the | specific numbering systems. The first digit has no marker. | (4) For any group with a value less than 20, remove the second | digit (the 1 in the tens column). Leave any associated markers. | (5) Concatenate the groups back into a single string, least | significant group last. | (6) Collapse any consecutive runs of 0 digits to a single 0. | (7) Replace each digit with the relevant character selected from | the numbering system's table. Rules (4) and (6) are only for Chinese numerals and not applicable to Japanese numerals. Rules for Japanese numerals (4-JA) and (6-JA) should be the following: (4-JA) Remove the 1 (一) where a digit marker (十/百/千) is appended. e.g., 11 is 十一 in both Chinese and Japanese 1111111 is 一百一十一万一千一百一十一 in Chinese but 百十一万千百十一 in Japanese. (6-JA) Remove any 0 (零 or 〇) unless the whole value is 0. e.g., 205 is 二百零五 or 二百〇五 in Chinese, 二百五 in Japanese. 100004 is 十万零四 or 十万〇四 in Chinese, 十万四 in Japanese. 0 is 零 or 〇 in both Chinese and Japanese Problem 2: CJK numbering system's tables errata | japanese-formal | This uses the cjk-ideographic system with the following table. | Formal Japanese numbering system Values Codepoints | Second Group Marker 万 U+4E07 | Third Group Marker 億 U+5104 | Fourth Group Marker 兆 U+5146 | Second Digit Marker 拾 U+62FE | Third Digit Marker 佰 U+4F70 | Fourth Digit Marker 仟 U+4EDF | Digit 0 零 U+96F6 | Digit 1 壹 U+58F9 | Digit 2 貳 U+8CB3 | Digit 3 參 U+53C3 | Digit 4 肆 U+8086 | Digit 5 伍 U+4F0D | Digit 6 陸 U+9678 | Digit 7 柒 U+67D2 | Digit 8 捌 U+634C | Digit 9 玖 U+7396 Japanese formal numbers have in-use and obsolete forms. See: http://en.wikipedia.org/wiki/Japanese_numerals#Formal_numbers I think 'japanese-formal' should use in-use characters as the following: japanese-formal Second Group Marker 万 U+4E07 (obsolete: 萬 U+842C) Third Group Marker 億 U+5104 Fourth Group Marker 兆 U+5146 Second Digit Marker 拾 U+62FE Third Digit Marker 百 U+767E (obsolete: 佰 U+4F70) Fourth Digit Marker 千 U+5343 (obsolete: 仟 U+4EDF) Digit 0 零 U+96F6 (〇 U+3007 is also used) Digit 1 壱 U+58F1 (obsolete: 壹 U+58F9) Digit 2 弐 U+5F10 (obsolete: 貳 U+8CB3) Digit 3 参 U+53C2 (obsolete: 參 U+53C3) Digit 4 四 U+56DB (obsolete: 肆 U+8086) Digit 5 五 U+4E94 (obsolete: 伍 U+4F0D) Digit 6 六 U+516D (obsolete: 陸 U+9678) Digit 7 七 U+4E03 (obsolete: 柒 U+67D2) Digit 8 八 U+516B (obsolete: 捌 U+634C) Digit 9 九 U+4E5D (obsolete: 玖 U+7396) The obsolete form may be defined as 'japanese-formal-obsolete' if needed. | japanese-informal | This uses the cjk-ideographic system with the following table. | Informal Japanese numbering system Values Codepoints | Second Group Marker 萬 U+842C | Third Group Marker 億 U+5104 | Fourth Group Marker 兆 U+5146 | Second Digit Marker 萬 U+842C | Third Digit Marker 億 U+5104 | Fourth Digit Marker 兆 U+5146 | Digit 0 零 U+96F6 | Digit 1 壹 U+58F9 | Digit 2 贰 U+8D30 | Digit 3 叁 U+53C1 | Digit 4 肆 U+8086 | Digit 5 伍 U+4F0D | Digit 6 陆 U+9646 | Digit 7 柒 U+67D2 | Digit 8 捌 U+634C | Digit 9 玖 U+7396 The 'japanese-informal' (normal Japanese numerals) table seems completely erroneous. The following is the corrected version: japanese-informal Second Group Marker 万 U+4E07 Third Group Marker 億 U+5104 Fourth Group Marker 兆 U+5146 Second Digit Marker 十 U+5341 Third Digit Marker 百 U+767E Fourth Digit Marker 千 U+5343 Digit 0 零 U+96F6 (〇 U+3007 is also used) Digit 1 一 U+4E00 Digit 2 二 U+4E8C Digit 3 三 U+4E09 Digit 4 四 U+56DB Digit 5 五 U+4E94 Digit 6 六 U+516D Digit 7 七 U+4E03 Digit 8 八 U+516B Digit 9 九 U+4E5D | simp-chinese-formal | This uses the cjk-ideographic system with the following table. | Formal simple Chinese numbering system Values Codepoints | Second Group Marker 万 U+4E07 | Third Group Marker 億 U+5104 | Fourth Group Marker 兆 U+5146 | Second Digit Marker 万 U+4E07 | Third Digit Marker 亿 U+4EBF | Fourth Digit Marker 兆 U+5146 | Digit 0 零 U+96F6 | Digit 1 壹 U+58F9 | Digit 2 貳 U+8CB3 | Digit 3 參 U+53C3 | Digit 4 肆 U+8086 | Digit 5 伍 U+4F0D | Digit 6 陸 U+9678 | Digit 7 柒 U+67D2 | Digit 8 捌 U+634C | Digit 9 玖 U+7396 The following is the corrected version (simp-chinese-formal): simp-chinese-formal Second Group Marker 萬 U+842C Third Group Marker 億 U+5104 Fourth Group Marker 兆 U+5146 Second Digit Marker 拾 U+62FE Third Digit Marker 佰 U+4F70 Fourth Digit Marker 仟 U+4EDF Digit 0 零 U+96F6 Digit 1 壹 U+58F9 Digit 2 贰 U+8D30 Digit 3 叁 U+53C1 Digit 4 肆 U+8086 Digit 5 伍 U+4F0D Digit 6 陆 U+9646 Digit 7 柒 U+67D2 Digit 8 捌 U+634C Digit 9 玖 U+7396 | simp-chinese-informal | This uses the cjk-ideographic system with the following table. | Informal simple Chinese numbering system Values Codepoints | Second Group Marker 萬 U+842C | Third Group Marker 億 U+5104 | Fourth Group Marker 兆 U+5146 | Second Digit Marker 萬 U+842C | Third Digit Marker 億 U+5104 | Fourth Digit Marker 兆 U+5146 | Digit 0 零 U+96F6 | Digit 1 壹 U+58F9 | Digit 2 貳 U+8CB3 | Digit 3 參 U+53C3 | Digit 4 肆 U+8086 | Digit 5 伍 U+4F0D | Digit 6 陸 U+9678 | Digit 7 柒 U+67D2 | Digit 8 捌 U+634C | Digit 9 玖 U+7396 The following is the corrected version (simp-chinese-informal): simp-chinese-informal Second Group Marker 万 U+4E07 Third Group Marker 亿 U+4EBF Fourth Group Marker 兆 U+5146 Second Digit Marker 十 U+5341 Third Digit Marker 百 U+767E Fourth Digit Marker 千 U+5343 Digit 0 零 U+96F6 (〇 U+3007 is also used) Digit 1 一 U+4E00 Digit 2 二 U+4E8C Digit 3 三 U+4E09 Digit 4 四 U+56DB Digit 5 五 U+4E94 Digit 6 六 U+516D Digit 7 七 U+4E03 Digit 8 八 U+516B Digit 9 九 U+4E5D | trad-chinese-formal | This uses the cjk-ideographic system with the following table. | Formal Traditional Chinese numbering system Values Codepoints | Second Group Marker 万 U+4E07 | Third Group Marker 亿 U+4EBF | Fourth Group Marker 兆 U+5146 | Second Digit Marker 万 U+4E07 | Third Digit Marker 亿 U+4EBF | Fourth Digit Marker 兆 U+5146 | Digit 0 零 U+96F6 | Digit 1 一 U+4E00 | Digit 2 亼 U+4EBC | Digit 3 三 U+4E09 | Digit 4 四 U+56DB | Digit 5 五 U+4E94 | Digit 6 六 U+516D | Digit 7 七 U+4E03 | Digit 8 八 U+516B | Digit 9 九 U+4E5D The following is the corrected version: trad-chinese-formal Second Group Marker 萬 U+842C Third Group Marker 億 U+5104 Fourth Group Marker 兆 U+5146 Second Digit Marker 拾 U+62FE Third Digit Marker 佰 U+4F70 Fourth Digit Marker 仟 U+4EDF Digit 0 零 U+96F6 Digit 1 壹 U+58F9 Digit 2 貳 U+8CB3 Digit 3 參 U+53C3 Digit 4 肆 U+8086 Digit 5 伍 U+4F0D Digit 6 陸 U+9678 Digit 7 柒 U+67D2 Digit 8 捌 U+634C Digit 9 玖 U+7396 | trad-chinese-informal | This uses the cjk-ideographic system with the following table. | Informal traditional Chinese numbering system Values Codepoints | Second Group Marker 萬 U+842C | Third Group Marker 億 U+5104 | Fourth Group Marker 兆 U+5146 | Second Digit Marker 萬 U+842C | Third Digit Marker 億 U+5104 | Fourth Digit Marker 兆 U+5146 | Digit 0 零 U+96F6 | Digit 1 一 U+4E00 | Digit 2 亼 U+4EBC | Digit 3 三 U+4E09 | Digit 4 四 U+56DB | Digit 5 五 U+4E94 | Digit 6 六 U+516D | Digit 7 七 U+4E03 | Digit 8 八 U+516B The following is the corrected version: trad-chinese-informal Second Group Marker 萬 U+842C Third Group Marker 億 U+5104 Fourth Group Marker 兆 U+5146 Second Digit Marker 十 U+5341 Third Digit Marker 百 U+767E Fourth Digit Marker 千 U+5343 Digit 0 零 U+96F6 (〇 U+3007 is also used) Digit 1 一 U+4E00 Digit 2 二 U+4E8C Digit 3 三 U+4E09 Digit 4 四 U+56DB Digit 5 五 U+4E94 Digit 6 六 U+516D Digit 7 七 U+4E03 Digit 8 八 U+516B Digit 9 九 U+4E5D Problem 3: 'cjk-decimal' is needed In this draft spec, the most useful CJK numbering system is missing. I would like to call this 'cjk-decimal'. This should be added to <numeric>. cjk-decimal 〇一二三四五六七八九 U+3007, U+4E00, U+4E8C, U+4E09, U+56DB, U+4E94, U+516D, U+4E03, U+516B, U+4E5D e.g., 2009 is 二〇〇九. 0 is 〇 (not 零). Problem 4: Fullwidth forms are needed The following fullwidth forms may be useful, especially in vertical text. fullwidth-decimal 0123456789 (U+FF10..U+FF19) fullwidth-lower-latin abcdefghijklmnopqrstuvwxyz (U+FF41..U+FF5A) fullwidth-upper-latin ABCDEFGHIJKLMNOPQRSTUVWXYZ (U+FF21..U+FF3A) Problem 5: Suffix of cjk numbering systems The suffix '.' U+002E for cjk-ideographic numbering systems or cjk alphabetic systems may not be appropriate. Especially in vertical text, the suffix '.' is very odd. I think the suffix should be changed to none for the following cjk numbering systems: cjk-ideographic japanese-informal japanese-formal japanese-formal-obsolete (I proposed in Problem 2) simp-chinese-informal simp-chinese-formal trad-chinese-informal trad-chinese-formal hiragana hiragana-iroha katakana katakana-iroha cjk-earthly-branch cjk-heavenly-stem cjk-decimal (I proposed in Problem 3) fullwidth-decimal (I proposed in Problem 4) fullwidth-lower-latin (I proposed in Problem 4) fullwidth-upper-latin (I proposed in Problem 4) Examples: I give some real cjk numbering example found in the web: Example 1. (from a Japanese government ordinance) http://law.e-gov.go.jp/htmldata/H20/H20SE389.html In this example, the 'japanese-informal' and the 'katakana-iroha' is used (without suffix): 一 沖縄振興開発金融公庫 二 首都高速道路株式会社 三 株式会社日本政策金融公庫 ... 十 日本私立学校振興・共済事業団 十一 軽自動車検査協会 十二 日本下水道事業団 ... イ 国家行政組織法第二十一条第四項 前段に規定する総括整理する職又は同条第五項 前段に規定する総括整理する職 ロ 内閣府設置法第十七条第八項 に規定する総括整理する職 ハ 宮内庁法 (昭和二十二年法律第七十号)第十五条第四項 に規定する総括整理する職 ... Example 2. (from a Japanese official notice) http://www.env.go.jp/hourei/syousai.php?id=14000081 In this example, 'cjk-decimal' (I proposed in Problem 3) and 'katakana' is used (without suffix, but parenthesized cjk-decimal is also used): 一 目的 ... 二 実施主体 ... 一〇 対象地域 ... 一一 医療手帳の交付の対象 ... 一二 療養手帳の交付 (一) 医療手帳の交付を受けようとする者は、関係県知事にその交付を申請しなければならない。 (二) 前号の申請には、次の書類を添付しなければならない。 ア 通常のレベルを超えるメチル水銀の曝露を受けた可能性があることを証する次のいずれかの資料 ... イ 特定症候についての関係県知事が指定する医療機関の医師の診断書。ただし、水俣病に係る認定の申請に対する審査に供された検診資料があるときは、提出することを要しない。 ウ 特定症候についての関係県知事が定める要件に該当する専門医の所定の記載事項を満たす診断書。 ... Best regards, -- Shinyu Murakami http://www.antennahouse.com Antenna House Formatter http://www.antenna.co.jp/AHF/en/
Received on Sunday, 8 February 2009 09:23:59 UTC