I18N-ISSUE-383: Justifying Korean text ??? [css-text]

I18N-ISSUE-383: Justifying Korean text ⓣ [css-text]

http://www.w3.org/International/track/issues/383

Raised by: Richard Ishida
On product: css-text

Thread: http://lists.w3.org/Archives/Public/public-i18n-cjk/2014JulSep/0002.html
About: http://dev.w3.org/csswg/css-text/#text-justify-property
Raised by: Koji Ishii


Hello/안녕하세요

Could someone please help us to discuss what’s right for justifying Korean text? This is a bit long e-mail, sorry for not being able to write in short.

Here’s a background. Last year, the CSS WG discussed on the text-justify property[1] and made a few resolutions. The full resolutions are here[2], but in summary:
1. Make justification behavior as automatic to the content language[3] as possible, and remove as much behavior-specific values as possible.
2. With that, “inter-ideograph” value (to expand between ideographic characters) was removed, but “inter-word” value (not to expand between ideographic characters) is still in.

In this context, I’m having difficulty to come up with what’s good for Korean text.

In my understanding, there are 3 types of Korean documents:

1. Ideographic only, ancient documents (may sometimes contain some hangul characters.)
2. Mostly Hangul, a few to some ideographic characters per a paragraph or a page.
3. All Hangul, no ideographic characters.

Q1. Is this understanding correct, or do I miss any other types?

I do not have a good sense of how many each documents are, so here’s the first question.

Q2. Can you give us the ratio of each type of documents on the web? I mean, ratios such as “0:40:60”. Any statistics would be great, but your own ratio as you feel is also helpful; if 10 people respond my-own-ratio, it’s a sort of statistics I suppose.
Q3. Is the ratio for papers/books/e-books different from the ratio for the web documents? How about TV/movie captions, signage, or anywhere else where web platform is used?

Next, let’s think about when author sets lang=“ko” to the document (and text-align:justify of course.) This case is easier because we can focus on what’s right for Korean. In this case, in my understanding, you want to expand only at spaces, correct? All existing browsers do not expand between Hangul, I suppose this is the correct behavior. However, Chrome/Safari expands between ideographic characters, I’m guessing this is not an expected behavior for type #2 documents and you want to fix this.

Q4. Is the assumption above correct?

The challenge in this case is that, you will not be able to justify type #1 documents, because text-justify does not have a value to expand between ideographic characters. If you want to solve this, you have following options:

1. Mark such documents as lang=“zh” (Chinese.) I’m not sure how right or wrong this is to you; are ancient documents considered as Chinese, or are they ancient Korean? I’m guessing this is wrong, but just wanted to ask. I’m sorry if this is really a bad, impolite question, I hope you understand that I’m just trying to list up all technically possible options here.
2. Propose CSS WG to revive “inter-ideograph” value, so that you can mark as lang=“ko” and optionally expand between ideographic characters.
3. Make “expand between ideographic and Hangul characters” default, and always use “inter-word” for type #2/#3 documents. This give you a choice, but as a cost, you have to mark all type #2/#3 documents as “inter-word”. I’m guessing the cost does not worth the value here?
4. Such documents are rare, justifying such documents are even rare to zero, so don’t need to fix this specific case (please consider Q2/Q3 above.)

Q5. Which option looks right to you, or anything else?

Next. This is harder one; when language is not specified. I suspect a large number of existing documents do not have lang, so this might affect backward compatibility more than Q5 does. I have to say that, in this case, there’s no single right solution because all existing browsers behave differently; we need to come up with some compromised, good enough behavior.

In this case, Chinese and Japanese documents want to expand between ideographic characters, while Korean type #2 documents do not, so there’s a conflict. I don’t know how to properly resolve this conflict, I’m guessing we should take Chinese and Japanese documents because they use justification more often, and the use of ideographic characters in Korea is not the primary use, but this is my personal opinion. Others might think differently, and answers to Q2/Q3 may also affect this.

Q6. What do you think about this?

Next. Let’s assume we took Chinese and Japanese (expand between ideographic characters) in Q6. In this case:

Q7. Do you want a) to expand between Hangul because Hangul and ideographic should behave the same way for type #2 documents, or b) not to expand between Hangul because doing so helps type #3 documents, even if it’s strange for type #2 documents?

Note that all browsers today do not expand between Hangul, even when they expand between ideographic characters. I have no idea how strange this behavior is to you, especially when thinking type #2 documents. In case you’re interested in seeing my investigation result of existing browser behaviors, here it is[4]. It’s primarily my own memo, quite terse and maybe hard to understand though.

Lastly, this is not a question, but if you create justified Korean HTML documents today, I recommend you to add 1) lang=“ko” and 2) text-justify:inter-word. It’s hard to predict how the future will be, but from what I can tell you at this moment, this is considered as the best practice to protect your documents in future.

If you could answer only part of questions, it’s still helpful. Thank you for reading this long e-mail, and look forward to hearing from you.

[1] http://dev.w3.org/csswg/css-text/#text-justify-property

[2] http://lists.w3.org/Archives/Public/www-style/2013Feb/0474.html

[3] http://dev.w3.org/csswg/css-text/#content-language

[4] http://1drv.ms/1r3iYme


/koji

Received on Thursday, 10 July 2014 16:38:11 UTC