Re: [css-text] Justifying Korean text

Thank you for the reply Dongwoo, and sorry for my late response.

On Jul 17, 2014, at 4:32 PM, Dongwoo Joshua Im <dw.im@samsung.com> wrote:

> Dear Koji,
> I left some inline comments below.
> Those are just "quick" answer, I will try to find better one.
> 
> Sorry for the late feedback about this.
> 
> 
>> In my understanding, there are 3 types of Korean documents:
>> 
>> 1. Ideographic only, ancient documents (may sometimes contain some hangul characters.)
>> 2. Mostly Hangul, a few to some ideographic characters per a paragraph or a page.
>> 3. All Hangul, no ideographic characters.
>> 
>> Q1. Is this understanding correct, or do I miss any other types? 
> 
> Also, we can think about Hangul and alphabet characters are in a paragraph together. (could be also with ideographic characters. What a combination!)
> I can say this is case 4.

Latin letters can mix into any of these and don’t cause conflicts, so I guess we don’t have to differentiate it.

>> I do not have a good sense of how many each documents are, so here’s the first question.
>> 
>> Q2. Can you give us the ratio of each type of documents on the web? I mean, ratios such as “0:40:60”. Any statistics would be great, but your own ratio as you feel is also helpful; if 10 people respond my-own-ratio, it’s a sort of statistics I suppose.
> 
> This is really hard to say.
> But, I guess, specially on the web, case #3 would be dominant one.
> After that, case #4 which I mentioned above.
> Then case #2, and the last would be case #1.
> 
> As you said, cannot find "official" statistics, will try to find.
> My "own" ratio would be.. (less than)1 : 20 : 55 : 25

So it’s 1:20:80 if #3 and #4 can merge. The other answer from glados was 10:20:70, I guess he sees more ancient documents and papers, but the two answers gave me a great sense, thank you.

The design conflict lives in #1 and #2 only, so if 70-80% of Korean documents are Hangul only (or with Latin), we could put priority to do better for them.

> Q3. Is the ratio for papers/books/e-books different from the ratio for the web documents? How about TV/movie captions, signage, or anywhere else where web platform is used?
> 
> papers.. could be different from web contents.
> Chance to see ideographic characters on papers is more than web contents.
> But, still, I think the order would be case #3, #4, #2, #1.
> 
> Captions.. to see ideographic on Caption is very rare, but cannot say 0.
> 
>> Next, let’s think about when author sets lang=“ko” to the document (and text-align:justify of course.) This case is easier because we can focus on what’s right for Korean. In this case, in my understanding, you want to expand only at spaces, correct? All existing browsers do not expand between Hangul, I suppose this is the correct behavior. However, Chrome/Safari expands between ideographic characters, I’m guessing this is not an expected behavior for type #2 documents and you want to fix this.
>> 
>> Q4. Is the assumption above correct?
> 
> I can say that we usually only expand inter-word for the case 2, 3, 4.
> For case 1, I'm not sure (never seen), but I can say we can treat same as chinese and japanese.

Thanks. So that says, since inter-word is only available in IE as of today, you can’t justify 20-30% of documents.

>> The challenge in this case is that, you will not be able to justify type #1 documents, because text-justify does not have a value to expand between ideographic characters. If you want to solve this, you have following options:
>> 
>> 1. Mark such documents as lang=“zh” (Chinese.) I’m not sure how right or wrong this is to you; are ancient documents considered as Chinese, or are they ancient Korean? I’m guessing this is wrong, but just wanted to ask. I’m sorry if this is really a bad, impolite question, I hope you understand that I’m just trying to list up all technically possible options here.
>> 2. Propose CSS WG to revive “inter-ideograph” value, so that you can mark as lang=“ko” and optionally expand between ideographic characters.
>> 
>> Can't "text-justify: distribute" handle the characters similar with “inter-ideograph”?
>> 
>> 3. Make “expand between ideographic and Hangul characters” default, and always use “inter-word” for type #2/#3 documents. This give you a choice, but as a cost, you have to mark all type #2/#3 documents as “inter-word”. I’m guessing the cost does not worth the value here?
>> 4. Such documents are rare, justifying such documents are even rare to zero, so don’t need to fix this specific case (please consider Q2/Q3 above.)
>> 
>> Q5. Which option looks right to you, or anything else?
> 
> I think "inter-word" could be the expected behavior for Hangul.
> I'm not sure about case #1.. I thought "distribute" would be the choice.

text-justify: distribute distributes between Latin alphabet letters too, which (I suppose) is not what you want.

So given what you wrote, choices for Korean are:
1. Add text-justify: inter-word for 70-80% of documents (#2 and #3), and do not add for #1.
2. Request CSS WG to bring inter-character back, and make inter-word the default behavior for lang=“ko”. You can then add text-justify: inter-character for #1.

I’ll write to www-style to see how people there are ok to bring inter-character back.

>> Next. This is harder one; when language is not specified. I suspect a large number of existing documents do not have lang, so this might affect backward compatibility more than Q5 does. I have to say that, in this case, there’s no single right solution because all existing browsers behave differently; we need to come up with some compromised, good enough behavior.
>> 
>> In this case, Chinese and Japanese documents want to expand between ideographic characters, while Korean type #2 documents do not, so there’s a conflict. I don’t know how to properly resolve this conflict, I’m guessing we should take Chinese and Japanese documents because they use justification more often, and the use of ideographic characters in Korea is not the primary use, but this is my personal opinion. Others might think differently, and answers to Q2/Q3 may also affect this.
>> 
>> Q6. What do you think about this?
> 
> If there are both korean character and ideographic character in the paragraph, we need to find which one is majority in the paragraph.
> If Korean is the one, then, I guess we need to apply 'inter-character" justification.
> But, seems not enough..

Yeah, looking at contents was raised as one of possible options, but I’m afraid a good number of people dislike such approach. With your answers to Q1-3, there’s a way to save 70-80% of documents (#3,) so that’s good. I don’t know how to save #1 and #2 when they’re not tagged as lang=“ko"...

>> Next. Let’s assume we took Chinese and Japanese (expand between ideographic characters) in Q6. In this case:
>> 
>> Q7. Do you want a) to expand between Hangul because Hangul and ideographic should behave the same way for type #2 documents, or b) not to expand between Hangul because doing so helps type #3 documents, even if it’s strange for type #2 documents?
> 
> Expanding between one word of Hangul would be very strange. (We do sometimes for specific reason, but, not for general usage.)

That’s what I expected, thank you.

/koji

Received on Saturday, 26 July 2014 18:24:29 UTC