csswg/css3-lists cjk-list-conversation.html,NONE,1.1

Update of /sources/public/csswg/css3-lists
In directory hutz:/tmp/cvs-serv29942

Added Files:
	cjk-list-conversation.html 
Log Message:
Added log of group conversation concerning CJK list styles.


--- NEW FILE: cjk-list-conversation.html ---
<!doctype html>
<title>CJK List Conversation</title>
<meta charset=utf8>
<style>
p {
	white-space: pre-line;
	padding-left: 2em;
	text-indent: -2em;
}
</style>
<p>The following conversation took place between 18Mar2011 and 21Mar2011, concerning the specification of CJK list numbering styles.  No editting took place other than what was necessary to add markup.</p>

<hr>

<p><b>Tab Atkins</b> - I could use 5 minutes of time from anyone who's a native Chinese or Japanese speaker today, to ask some simple questions about how you number lists and such. This will greatly aid me in not speccing something retarded in the CSS3 Lists module.

Could someone please help?</p>


<p><b>Ryosuke Niwa</b> - What do you need to know?</p>


<p><b>Tab Atkins</b> - The only difference between formal and informal japanese numbering (besides the characters used) appears to be that in informal japanese, you just use the digit marker (no 一) if it's the 2nd, 3rd, or 4th digit in the group is 1, while in formal japanese you only do that if the value of the group is between 10 and 19. Is this true? (Formal makes me kind of sad right now, because I can implement informal as a simple additive system, but I can't handle the special requirement that formal appears to need.)

What should I use as a negative prefix? I've been told that four different things are acceptable - the plain dash U+002d, the fullwidth minus sign U+ff0d, "▲", or "マイナス". Are these different based on formal vs informal?

I have some a slightly confusing recommendation that the japanese alphabetic list numberings (equivalent to using A, B, C for lists) should use a particular negative sign as well. This confuses me because alphabetic numbering typically starts at 1, and doesn't handle 0 or negative numbers at all. Is Japanese different in this regard somehow?

Are the fullwidth roman numerals (U+2160-216B) often used in list numbering? I currently support roman numerals using ordinary ascii glyphs, as recommended by Unicode, but I can add fullwidth versions as well if it's reasonably common.</p>


<p><b>Ryosuke Niwa</b> - I don't get what you mean by formal/informal Japanese numbering. What are they? What you should put as a prefix depends on the context. What kind of number are you talking about?

Fullwidth roman numerals are quite often used to make them look more aligned with the rest of the document.</p>


<p><b>Tab Atkins</b> - I don't understand the distinction well between formal/informal either. I think formal is used more for legal documents? The glyphs are described at &lt;http://www.w3.org/TR/css3-lists/#japanese-formal> (though apparently these are all completely wrong, as the corrections I have replace almost all the formal glyphs, and all the informal ones). 

The cjk numbering algorithm is at &lt;http://www.w3.org/TR/css3-lists/#cjk-ideographic> - rule 4 is the one in question, as it apparently only applies to formal numbering, while informal numbering always drops "1" digits if they're attached to a digit marker.</p>


<p><b>Ryosuke Niwa</b> - Ah... those are new/old Kanji characters. But informal ones should be 一、二、三、四、五、六、七、八、九、十、十一、etc... which are DIFFERENT from the Chinese counterparts.

"For any group with a value less than 20, remove the second digit (the 1 in the tens column). Leave any associated markers." doesn't make any sense. Which one of CJK does this?

Also, the way Chinese and Japanese write numbers are VERY different (not sure about Korean). For example, Chinese uses different characters for two when it leads the number. 兩千三百六十二 translates to two thousand three hundred sixty two. Notice 兩 at the beginning and 二 at the end both refer to two but use two distinct characters. You do this for only certain groups and only if it's the first digit in the number. And we don't do that in Japanese. I remember there was also a special rule for zeros but don't remember anymore. Xiaomei Ji should know.

In summary, I agree. The current spec makes very little sense and is hardly correct.</p>


<p><b>Xiaomei Ji</b> - Specifically, should I read the spec about http://www.w3.org/TR/css3-lists/#cjk-ideographic, 
http://www.w3.org/TR/css3-lists/#simp-chinese-formal, and http://www.w3.org/TR/css3-lists/#trad-chinese-formal to see whether they are correct for number lists in Chinese? And any specific questions you have in mind for Chinese?</p>


<p><b>Ryosuke Niwa</b> - Now let me elaborate on how we right informal numbers. I don't really know how formal numbers work so you should wait until someone else can comment.

For each group of 4, we apply the following rule:
1. Replace Arabic digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 by Kanji characters 零、一、二、三、四、五、六、七、八、九 respectively.
2. Remove all leading zeros
3. Append second group markers (十, 百, 千) to tenth, hundredth, and thousandth digit if it had not been previously removed.
4. For each digit on tenth, hundredth, and thousandth digit, remove 一 if it proceeds group markers (十, 百, 千).
5. Remove any zeros and also group markets (十, 百, 千) that follows the zero.

6. Concatenate groups of 4 into one number separate by 万, 億, 兆 from the smallest to the largest when the corresponding group is not zero.
7. If the output is empty, output 零.

I think this works...</p>


<p><b>Ryosuke Niwa</b> - Xiaomei Ji: I think http://www.w3.org/TR/css3-lists/#cjk-ideographic is the one that's most problematic.</p>


<p><b>Tab Atkins</b> - Ryosuke Niwa: Okay, that matches what others have told me about numbering, except that it's been recommended that I should use 〇 instead of 零. Is there a strong preference either way?

Xiaomei Ji: For Chinese, reviewing the algorithm at #cjk-ideographic would be great. I'll also then need a sanity check for the sets of characters specified in the four chinese types (#simp-chinese-formal, #simp-chinese-informal, #trad-chinese-formal, #trad-chinese-informal).</p>


<p><b>Ryosuke Niwa</b> - Let me apply my algorithm to 510,0000,3102. I'm going to apply steps 1-5 simultaneously to all groups:
1. 零五一零 | 零零零 | 三一零二
2. 五一零 | | 三一零二
3. 五百一十零 | | 三千一百零十二
4. 五百十零 | | 三千百零十二
5. 五百十 | | 三千百二

6. 五百十億三千百二
7. 五百十億三千百二</p>


<p><b>Ryosuke Niwa</b> - Tab Atkins: 〇 can be used for the informal case but you should definitely be using 零 for the formal case.</p>


<p><b>Ryosuke Niwa</b> - Now I remember! "Collapse any consecutive runs of 0 digits to a single 0." is the Chinese rule for zeros. Unlike Japanese which omits zeros, Chinese collapses multiple zeros into one zero.</p>


<p><b>Tab Atkins</b> - Ah, that makes sense then. The Japanese feedback I'd gotten had said rule 4 was incorrect, but the Chinese feedback didn't mention it at all.

Do either of you know any native Korean speakers?</p>


<p><b>Ryosuke Niwa</b> - Jungshik Shin ?</p>


<p><b>Tab Atkins</b> - Ryosuke Niwa Getting back to the question of a prefix for negative numbers, these are generally going to be used for numbering lists, like in an &lt;ol>. They may also be used as counter values elsewhere, but that's a relatively niche case.</p>


<p><b>Xiaomei Ji</b> - -- my reply only applies to Simplified Chinese, and I mixed the usage of the formal/informal chinese character in my reply.

1。 Split the decimal number into groups of four digits, starting with the least significant digit.
--- correct.

2。 Ignoring groups that have the value zero, append the second group marker to the second group, the third group marker to the third group, and the fourth group marker to the fourth group. These markers are defined in the tables for the specific numbering systems. The first group has no marker.
-- correct.
-- but the "Fourth Group Marker	兆" is confusing. Check out http://zh.wikipedia.org/zh/%E5%85%86
In old time, 兆 means 10^16. Now, young people most likely think they are 10^6. For disambiguous, use "万亿" as forth group marker and use "亿亿" as 5th group marker.


3。 For each group, ignoring digits that have the value zero, append the second digit marker to the second digit, the third digit marker to the third digit, and the fourth digit marker to the fourth digit. These markers are defined in the tables for the specific numbering systems. The first digit has no marker.
-- In Chinese, "digits that have the value zero" should not be ignored, if they are the middle digits. 
If they are leading leading digits, it could be ignored (it is OK to say "01" as "壹“ ) except when spellout date/number in checks (maybe that is the difference of informal and formal). According to http://hi.baidu.com/prestar/blog/item/ab82543d79886cea3d6d97f8.html/cmtid/5167bb09ea93d12a6a60fb13 "1月" should be spelled out as "零壹月" to avoid being changed.
"1RMB" should be spelled out as "零壹元“ to avoid being changed.

In the case of ""digits that have the value zero" is middle digits, 零 should be included.
For example: 101 should be "一百零一“. It is ambiguous to say "一百一“, it could be "101", or it could be "110". I think "一百一“ means "110", but I heard there are people from different cities use it to mean "101". 

Here should apply the 6th step: "Collapse any consecutive runs of 0 digits to a single 0."
For example, "1001" should be "一千零一“。

When they are ending digits, they could be ignored.
For example: "110" spelled out as “一百一拾".
"20,0000" spelled out as "二拾万" in 2nd group.


4。 For any group with a value less than 20, remove the second digit (the 1 in the tens column). Leave any associated markers.
-- could be true in Chinese, at least oral writing. For example, "19" could be spelled out as "拾九“ or "一拾九" simply because "拾" == "一拾".
But spell it out would never cause ambiguous.

5。 Concatenate the groups back into a single string, least significant group last.
-- correct.

6。Collapse any consecutive runs of 0 digits to a single 0.
-- I do not see why it is applied here. I think it should be applied in step 3rd.
-- When "0" is the trailing in one group, we do not need to spell it out, so, after concatenation, there should be cases that there are consecutive "零"s because of combining of 2 groups.

7。Replace each digit with the relevant character selected from the numbering system's table.
-- the characters in the table are not correct at all.


Questions:
1. what is the usage of formal and informal?
2. I just deleted my whole writing by accidentally clicked (?) key and caused the page reload. So, I have to re-write. Anyone know the way to get it back. press 'back' certainly did not get me "back" what I wrote.</p>


<p><b>Ryosuke Niwa</b> - I don't think we can express negative numbers in Kanji characters generally. Although there are specific postfixes / prefixes that can be used to mean negative values but they are context sensitive (e.g. temperature).</p>


<p><b>Tab Atkins</b> - Ryosuke Niwa When using the algorithm we've been talking about, though, which character should be used for list numbering?</p>


<p><b>Ryosuke Niwa</b> - Tab Atkins: We normally use Arabic numbers if we need to express negative numbers. But I don't think we should spec such an odd behavior. We never mix Kanji numbers and Arabic numbers. We should say the behavior is undefined if negative number is to be shown.

It's like trying to express a letter that doesn't exist in the character set. It doesn't make any sense.</p>


<p><b>Tab Atkins</b> - Okay, that's fine. The original algorithm in the spec is only defined for 0 to 10^16, but this feedback I'm looking at talks about minus signs, so I wasn't sure if I should say the algorithm was defined for negative numbers or not.</p>


<p><b>Ryosuke Niwa</b> - Tab Atkins: In Japanse, we have groups up until 10^68. http://ja.wikipedia.org/wiki/%E5%91%BD%E6%95%B0%E6%B3%95 I'm not sure why it's not included there...</p>


<p><b>Tab Atkins</b> - Xiaomei Ji Awesome, thanks for the feedback. I'm rewriting Japanese right now, then I'll rewrite Chinese and run it by you.</p>


<p><b>Tab Atkins</b> - Ryosuke Niwa Wow, okay. I can certainly include those, yeah. The group markers are just the values in the first column of that table, right? What's the significance of the kanji in parentheses after each group marker? (I can't read any japanese.)</p>


<p><b>Ryosuke Niwa</b> - Tab Atkins: Yes. I don't think you need to include/require it in the spec. Just mention that implementors can support them or that implementors should be aware of them.</p>


<p><b>Tab Atkins</b> - Either I require it, or I omit it entirely (perhaps with a note that they're intentionally omitted). Optional behavior is unacceptable in a spec. ^_^</p>


<p><b>Ryosuke Niwa</b> - Tab Atkins: I don't know if we should omit it entirely. I think we should just limit the scope of the algorithm to 10^16 for now. That way, we can extend the algorithm later to support them or use technically "wrong" but popular groupings in the future.</p>


<p><b>Tab Atkins</b> - So that would be "omit entirely, with a note that there are further group markers but they are intentionally not included here, but the algorithm may be extended in the future.".</p>


<p><b>Jungshik Shin</b> - CLDR also has RBNF (rule-based number formatting) data for Chinese, Japanese and Korean (and other) numbers in multiple level of 'formality' (finance, non-finance, etc). I'll provide a link to that

As for going all the way up to 10^64 / 10^68 (I thought 無量大数 is 10^64....), I agree with Ryosuke about making it optional.</p>


<p><b>Xiaomei Ji</b> - Simp-chinese-formal: (those character for digits are used in currency and check. They are not used anywhere else, I think)

Values	Codepoints
Second Group Marker	万	U+4E07
Third Group Marker	亿	U+4EBF
Fourth Group Marker	万亿 (*changed)
Second Digit Marker	拾	U+62FE (*changed)
Third Digit Marker	佰	U+4F70 (*changed)
Fourth Digit Marker	仟	U+4EDF (*changed)
Digit 0	零	U+96F6
Digit 1	壹	U+58F9
Digit 2	贰	U+8D30 (*changed)
Digit 3	叁	U+53C1 (*changed)
Digit 4	肆	U+8086
Digit 5	伍	U+4F0D
Digit 6	陆	U+9646 (*changed)
Digit 7	柒	U+67D2
Digit 8	捌	U+634C
Digit 9	玖	U+7396


simp-chinese-informal (used more often, in non-currency related scenario. The one in the spec is totally wrong).

Second Group Marker	万	U+4E07
Third Group Marker	亿	U+4EBF
Fourth Group Marker	万亿 
Second Digit Marker	十 U+5341 
Third Digit Marker	百 U+767E 
Fourth Digit Marker	千 U+5343 
Digit 0	零	U+96F6 or 〇 U+3007 (they both appears, I am not sure which is more popular or standard).
Digit 1	一 U+4E00
Digit 2	二 U+4E8C 
Digit 3	三 U+4E09
Digit 4	四 U+56DB
Digit 5	五 U+4E94
Digit 6	六 U+516D 
Digit 7	七 U+4E03
Digit 8	八 U+516B
Digit 9	九 U+4E5D</p>


<p><b>Jungshik Shin</b> - Tab Atkins, would you add Korean rules? I provided CSS-WG with the information perhaps 8 - 9 years ago (IIRC, it's added to Mozilla), but it didn't make it to the spec.</p>


<p><b>Tab Atkins</b> - Xiaomei Ji Yay, thanks!

Some clarifications on your algorithm above:

So, for each group, you omit any leading/trailing zeros, and collapse the middle ones? So 2000 0000 is 二千万, 2 0000 is 二万, and 2002 0000 is 二千零二万, correct? What about 2 0000 2000 - is it 二亿二千, or 二亿零二千?

Should I ever omit 1s? You're somewhat noncommittal in your algorithm - you say it would be okay to write 19 as 十九, but that 一十九 is okay too. What about 219 - would you ever write that as 二百十九, or would you always say 二百一十九?

You definitely don't omit 1s in the 3rd or 4th digits of a group, right?</p>


<p><b>Tab Atkins</b> - Jungshik Shin Yup, got a placeholder for korean rules right now, but I'll definitely be adding them explicitly if they're different from chinese and japanese. I'll see if I can find your email - if not, I'll ping you for a description of the algorithm.</p>


<p><b>Jungshik Shin</b> - Thank you for having a placeholder for Korean. 
See also my 8 year old email at http://lists.w3.org/Archives/Public/www-style/2003Apr/0063.html :-)</p>


<p><b>Jungshik Shin</b> - Frank Tang : Frank, can you re-hash your 8-yr old comment to CSS-WG here again? Especially, can you correct/review Traditional Chinese? Thanks.</p>


<p><b>Tab Atkins</b> - Frank Yung-Fong Tang Or link me to the comment, like Jungshik helpfully did for his. ^_^</p>


<p><b>Tab Atkins</b> - Jungshik Shin: Is the algorithm at http://dev.w3.org/csswg/css3-lists/#chinese-counter-styles the same for Korean, just with different characters?

Xiaomei Ji: Is the algorithm at http://dev.w3.org/csswg/css3-lists/#chinese-counter-styles correct? In particular, note rule 5, and the fact that I don't drop any of the 1s.</p>


<p><b>Jungshik Shin</b> - http://unicode.org/repos/cldr/trunk/common/rbnf/ko.xml (use the latest Chrome trunk build for xml display in Chrome or use Firefox) has additional entries for Korean (that use 'Hangul' instead of Chinese characters). However, it may not be in the easiest form to understand unless it's viewed along with RBNF spec at http://www.icu-project.org/apiref/icu4c/classRuleBasedNumberFormat.html#_details

I'll enumerate digits/markers (for korean-sinokorean) in a spreadsheet and add a link here. 

BTW, I'm afraid that we need a locale-specific tweak or two for the algorithm because Xiaomei's algorithm wouldn't work for Korean in a couple of places. 

I'll also go through the algorithm at http://dev.w3.org/csswg/css3-lists/#chinese-counter-styles c</p>


<p><b>Ryosuke Niwa</b> - Jungshik Shin: Yeah, we definitely need separate algorithms for each one of CJK.</p>


<p><b>Xiaomei Ji</b> - About omit 1, it is on the condition that "a value less than 20" (specified in the spec). For 219, you would say 二百一十九. 

As to omit 零, you raised a good question. I think either 二亿二千 or 二亿零二千 is acceptable.
But 二亿零二千 is more formal, at least what recommended in "how to write Chinese currency numbers" in http://hi.baidu.com/prestar/blog/item/ab82543d79886cea3d6d97f8.html/cmtid/5167bb09ea93d12a6a60fb13 again.
So, the algorithm should be:
1. in each group, ending 0s are ignored, middle 0s are collapsed, leading 0s are collapsed too unless it is the leading 0 of the whole number (in which case, it could be ignored. I said that you use "零壹元" for 1RMB to avoid being modified, but thinking the case 2,0000RMB, we say "二万元“ although it suffers the same risk of being modified. I never saw the case of "零二万元". So, I think leading 0 in the whole number can be ignored).
2. if a group is all 0, it collapses to one 0.
3. then, we need Collapse any consecutive runs of 0 digits to a single 0.

Hope we can find Chinese number writing standard.
@suzhe could you double check?</p>


<p><b>Xiaomei Ji</b> - even for "value less than 20", not omit 1 is common in formal way, and it is acceptable in informal way.</p>


<p><b>Tab Atkins</b> - Also: I wish I could read Chinese/Japanese, because then the references would actually help me. >_<</p>


<p><b>Ryosuke Niwa</b> - By the way, formal and informal ways of writing were referred to as financial and normal ways in http://en.wikipedia.org/wiki/Chinese_numerals.

Also, it says negative number is expressed by prefixing 負: http://en.wikipedia.org/wiki/Chinese_numerals#Negative_numbers

So it seems like you can and you should support negative numbers for Chinese. Jungshik Shin: How about Korean? Do you write negative numbers in Hanja/Hangul?

Regardless, I wouldn't refer them as formal and informal. The best words I can think of are traditional and normal respectively.</p>


<p><b>Jungshik Shin</b> - CLDR uses 'financial' for using '贰' instead of '二' (and others like that). Because they're mainly used to prevent 'forgery' in financial documents (check, contract, etc).

Edit:
See http://www.unicode.org/repos/cldr/trunk/common/rbnf/ja.xml
http://www.unicode.org/repos/cldr/trunk/common/rbnf/zh.xml
http://www.unicode.org/repos/cldr/trunk/common/rbnf/zh_Hant.xml</p>


<p><b>Tab Atkins</b> - I think I'll go with "financial" as the prefix for the formal kind. Also, thanks for digging up the prefix to use for negative numbers, Ryosuke!

Jungshik Shin: Damn, now I have to internalize the RBNF spec and figure out what Korean does different. ;_;</p>


<p><b>Xiaomei Ji</b> - a note for http://en.wikipedia.org/wiki/Chinese_numerals.
萬 and 億 (the last 2 rows in the "financial" column) are traditional characters. We do not use them even in financial documents. We use 万 and 亿 (the same as what used in "normal" way).</p>


<p><b>Tab Atkins</b> - Xiaomei Ji Yeah, I've just used the characters you gave me.</p>


<p><b>Ryosuke Niwa</b> - Tab Atkins: I'd also point out that characters listed on trad-chinese-formal and trad-chinese-informal are wrong. I don't know enough about Traditional Chinese characters to confidently tell you the right ones though.</p>


<p><b>Xiaomei Ji</b> - Ryosuke, you are right. I've pointed out that my reply only applies to Simplified Chinese. I think Tab already asked Frank to validate the data and algorithm for traditional Chinese.</p>


<p><b>Tab Atkins</b> - Ryosuke Niwa Yeah, I'm just going to stub out those lists for now and wait for further validation.</p>


<p><b>Tab Atkins</b> - Jungshik Shin I'm looking over your email at http://lists.w3.org/Archives/Public/www-style/2003Apr/0063.html . You respond to your own email with some corrected lists hosted at jshin.net, but that domain no longer exists. Do these corrected lists still exist somewhere?</p>


<p><b>Frank Yung-Fong Tang</b> - The work I did long time ago is summarized in the paper http://unicode.org/iuc/iuc16/a013.html . If someone still have a IUC16 CD, then we should be able to get that old old paper. The implementation can be found in http://mxr.mozilla.org/mozilla/source/layout/generic/nsBulletFrame.cpp look at gCJKIdeographicDigit1 .... and so on. CCJK use different unit and digits there</p>


<p><b>Ryosuke Niwa</b> - Here are financial numerals for Japanese:
壱, 弐, 参, 四, 五, 六, 七, 八, 九, 拾 (1, 2, 3, 4, 5, 6, 7, 8, 9, 10 respectively)
versus normal ones:
一, 二, 三, 四, 五, 六, 七, 八, 九, 十.

Now I started to think that in financial numerals (formally called 大字=Daiji), it's odd to omit the leading 1 because that'll be vulnerable to forging. So you should probably not drop leading 壱s.</p>


<p><b>Tab Atkins</b> - Ryosuke Niwa Do you have unicode code points for those? Or can you point me at a page which does?</p>


<p><b>Tab Atkins</b> - Hmm. It feels like we're straying further and further away from what someone would use for list numbering. You could use counters for generic number formatting, but that's not really a goal here. Do I even need financial numbers, then? Might it make more sense to just use the "informal" numbering?</p>


<p><b>Ryosuke Niwa</b> - Tab Atkins: I have seen documents using financial numbers as list numbers once or twice in my life but I don't think it's a common practice. I mean those financial numbering system was designed to prevent forgery (e.g. it's easily to add a line to 一 to make 二) so there isn't much point in using it for other purposes.

But there was a heated discussion about this topic on public-html-ig-jp in Jan 2011 so you should talk to "Koji Ishii" (kojiishi@gluesoft.co.jp).

By the way, it seems like I lied. On that thread, it has been agreed that we should use 〇 U+3007 for zero in both normal and financial numbers.

You can find code points in:
http://lists.w3.org/Archives/Public/public-html-ig-jp/2011Jan/att-0007/css3-lists-japan-feedbacks.htm
http://lists.w3.org/Archives/Public/www-style/2009Feb/0252.html

It seems like there has also been a discussion on defining japanese-formal-obsolete which will consist of old Kanji-characters that are used for stylistic effects on modern books / magazines.</p>


<p><b>Tab Atkins</b> - Ryosuke Niwa Regarding the use of 〇, I went with 零 for both based on the fact that 零 seems to return way more google results when used in numbers than 〇.

Thanks for the heads-up about the public-html-ig-jp discussion. I'll look into it and follow up with Koji. He's a member of the CSSWG already.</p>


<p><b>Ryosuke Niwa</b> - That's probably because 零 is used in Chinese, no?</p>


<p><b>Tab Atkins</b> - Hmm, maybe.</p>


<p><b>Gavin Peters</b> - I was taught to write 〇 in chinese sometimes. However, I'm far from authoritative.</p>


<p><b>Ryosuke Niwa</b> - Gavin Peters: I think it's 零 is more popular than 〇. Even in Japanese, I'd personally prefer to write 零 because 〇 doesn't look like a Chinese/Kanji character.</p>


<p><b>Gavin Peters</b> - Ryosuke, I concur. And it may have been taught to me simply because it's understood, and easier for second language learners to learn how to write.</p>

Received on Tuesday, 22 March 2011 17:43:26 UTC