Re: HANGUL JONGSEONG, vertical text flow, and Unicode East Asian Width from Asmus Freytag on 2011-03-14 (public-i18n-cjk@w3.org from January to March 2011)

From: Asmus Freytag <asmusf@ix.netcom.com>
Date: Mon, 14 Mar 2011 15:25:14 -0700
To: Koji Ishii <kojiishi@gluesoft.co.jp>
CC: Richard Ishida <ishida@w3.org>, public-html-ig-ko@w3.org, public-i18n-cjk@w3.org
Message-ID: <4D7E95CA.9040006@ix.netcom.com>
On 3/14/2011 1:48 PM, Koji Ishii wrote:
> Asmus,
>
> I understand your points that EAW was originally designed to support legacy encodings, and I can guess it's not easy to revise this given several constrains you might have in Unicode side.
>
> However, EAW states that it has 3 scopes:
> • Have to interwork with East Asian legacy character encodings
> • Support both East Asian and Western typography and line layout
> • Need to associate fonts with unmarked text runs containing East Asian characters
> and Recommendation mentions typography and layout, so I hope you take other scopes than legacy encoding into account as well.
>
> As days go, we will have more East Asian code points in Unicode which does not have code points in legacy encodings, but we want their typography and layout properties defined properly and consistently. Since EAW is referred by other specs like CSS3 Text, I hope future discussions in Unicode covers other scopes than legacy encodings as well.

Your point about specs referencing EAW is well taken. I no longer 
maintain this spec, but I was planning to review Jungshik's comments and 
propose a suitable plan for action for Unicode. But, as you realized 
yourself, there are competing constraints here, so I won't promise any 
particular results before I'm done with the analysis and UTC is done 
with their review.

A./
> The issue is less important than I originally thought as Soonbo and Richard pointed out, these code points are primarily for composing Jamos, and the code points for Hangul Letter are available. But theoretically, as Jungshik and Richard pointed out, these code points can be used alone by itself and clarifications in such cases is still appreciated.
>
> I think Jungshik's previous mail (they should be upright if used alone in vertical flow) answers Richard's question, but double confirmation is appreciated from me too.
>
>
> Regards,
> Koji
>
> -----Original Message-----
> From: Asmus Freytag [mailto:asmusf@ix.netcom.com]
> Sent: Tuesday, March 15, 2011 3:31 AM
> To: Richard Ishida
> Cc: Koji Ishii; public-html-ig-ko@w3.org; public-i18n-cjk@w3.org
> Subject: Re: HANGUL JONGSEONG, vertical text flow, and Unicode East Asian Width
>
> Richard,
>
> very sensible way to slice this.
>
> On the Unicode side, we can look separately at whether it makes sense to address the kinds of inconsistencies that Koji and Jungshik have identified (I haven't had time to study these in depth).
>
> Unicode's EAW was designed to deal with *legacy* character sets, which usually don't contain conjoining Jamos. And the layout prescriptions for it were primarily intended to deal with things like wide ASCII, not to serve as a comprehensive description of all Asian character layout.
> Because of this, it's not clear whether at the end of the day, even a cleaned up EAW property would align fully with your needs.
>
> Anyway, for now it would make sense to enumerate their layout behavior in whatever fashion works for this purpose.You can then check later, after EAW has been revised, whether it needs to be maintained, or whether it will then be redundant and drop out.
>
> A./
>
> On 3/14/2011 11:00 AM, Richard Ishida wrote:
>> Coming at this from the use case requirements rather than trying to
>> work it out from the implementation details:
>>
>> Take the word 한글 (hangul).  I can obviously write this
>>
>> 한  D55C  [Hangul Syllables]
>> 글  AE00  [Hangul Syllables]
>>
>> and I'd expect this to show two syllabic glyphs vertically arranged.
>>
>> However, I could equally well, in memory, have the following:
>>
>> ᄒ  1112  HANGUL CHOSEONG HIEUH
>> ᅡ  1161  HANGUL JUNGSEONG A
>> ᆫ  11AB  HANGUL JONGSEONG NIEUN
>> ᄀ  1100  HANGUL CHOSEONG KIYEOK
>> ᅮ  116E  HANGUL JUNGSEONG U
>> ᆯ  11AF  HANGUL JONGSEONG RIEUL
>>
>> Which should, when displayed, look exactly the same. My assumption is
>> that any non-separated sequence of characters constituting a syllable
>> or part of a syllable from the Unicode hangul jamo block would combine
>> and therefore be displayed without rotation.
>>
>> Since the font should combine these characters into two
>> two-dimensional syllabic arrangements, I don't know whether it's
>> necessary to specify placement in terms of grapheme clusters, or to
>> just assume that the font will take care of this anyway.
>>
>> If I wanted to list the jamo involved in such a word, for say an
>> educational or linguistic text, I'd actually have to do something like
>> this:
>>
>> ᄒ  1112  HANGUL CHOSEONG HIEUH
>>   200B  ZERO WIDTH SPACE
>> ᅡ  1161  HANGUL JUNGSEONG A
>>   200B  ZERO WIDTH SPACE
>> ᆫ  11AB  HANGUL JONGSEONG NIEUN
>>   200B  ZERO WIDTH SPACE
>> ᄀ  1100  HANGUL CHOSEONG KIYEOK
>>   200B  ZERO WIDTH SPACE
>> ᅮ  116E  HANGUL JUNGSEONG U
>>   200B  ZERO WIDTH SPACE
>> ᆯ  11AF  HANGUL JONGSEONG RIEUL
>>
>> to stop them combining visually.
>>
>> The important question, to my mind, is whether the characters are
>> rotated when they occur either individually or separated as above.  My
>> guess is no. (There is also the question: would you ever find this in
>> vertical text, but I assume that we must assume that someone might
>> want to do so at some time.)
>>
>> I assume that we should care less about the hangul compatibility
>> characters since they shouldn't be used anyway. But since, if they are
>> used they do not lead to this combining behaviour, it makes sense that
>> they are non-rotated.
>>
>>
>> RI
>>
>>
>>
>> On 09/03/2011 23:09, Koji Ishii wrote:
>>> Hello,
>>>
>>> Will you mind to help me to resolve a question in CSS3 Writing Modes
>>> spec?
>>>
>>> I'm trying to figure out which characters are displayed upright and
>>> which are rotated sideways in vertical text flow. I understand
>>> vertical text flow isn't very important for Hangul, but I hope you
>>> understand I want to write the correct spec in case you need it.
>>>
>>> Current idea is written in the spec[1], paragraphs after Figure 10.
>>> The basic idea is to use a combination of font information, Unicode
>>> Script Property[2], and Unicode East Asian Width[3].
>>>
>>> EAW (Unicode East Asian Width) defines character orientation like
>>> this in its Recommendation section[4]:
>>> * Wide characters ... are not rotated (and therefore rendered
>>> upright) when appearing in vertical text runs.
>>> * Narrow characters ... are rotated sideways, when appearing in
>>> vertical text.
>>>
>>> If I look into the data file[5], most Hangul characters are W(ide),
>>> so they are rendered upright in vertical text flow according to the
>>> Unicode definitions. I suppose this is what you expect.
>>>
>>> However, many of HANGUL JONGSEONG are marked as N and therefore they
>>> must be rotated sideways in vertical text flow if we follow this rule.
>>>
>>> 115F;W # HANGUL CHOSEONG FILLER
>>> 1160;N # HANGUL JUNGSEONG FILLER
>>> 1161;N # HANGUL JUNGSEONG A
>>> 1162;N # HANGUL JUNGSEONG AE
>>> 1163;N # HANGUL JUNGSEONG YA
>>> ...
>>>
>>> I'm guessing this is NOT what you expect. Can anyone in this ML help
>>> me to resolve this situation? Possible answers I'm guessing are:
>>>
>>> 1. Unicode EAW is correct; these code points should be rotated
>>> sideways in vertical text flow.
>>> 2. Unicode EAW is incorrect; these code points should be "W", not "N".
>>> 3. There are reasons to make these code points as "N", so EAW is
>>> correct, but "Narrow are rotated sideways" is incorrect.
>>>
>>> Which one is it, or anything else? I asked this to Soonbo Han from LG
>>> at CSSWG, he thinks the answer is not 1, but he wasn't sure if it's 2
>>> or 3 or else.
>>>
>>> Your support is greatly appreciated.
>>>
>>>
>>> Regards,
>>> Koji
>>>
>>> [1] http://dev.w3.org/csswg/css3-writing-modes/#text-orientation
>>> [2] http://unicode.org/reports/tr24/
>>> [3] http://unicode.org/reports/tr11/
>>> [4] http://unicode.org/reports/tr11/#Recommendations
>>> [5] http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt
>>>
>>>
Received on Monday, 14 March 2011 22:26:28 UTC