Re: Strawman proposal for UTR #50: Unicode Properties for Vertical Text Layout

On 9/27/2011 11:33 PM, Koji Ishii wrote:
> Hi Eric, I'm sorry for responding to a very old e-mail, but here's my first pass to go through your proposal.

No problem. However, I am in the final stage of the first draft of 
UTR50, and I did some significant changes compared to the strawman 
proposal. I hope this draft will be published soon.

>
> 1. I don't understand why we need L/V/T here. They are part of a grapheme cluster. Shouldn't they just follow conventions defined in UAX#29 UNICODE TEXT SEGMENTATION[1]?

They are gone.

> 2. When determining orientation for a grapheme cluster, I agree with "they get the orientation of their base", but I have one open issue here--CIRCLED alpha-numerics. They (e.g., U+2460 CIRCLED DIGIT ONE) will be upright as in your proposal and in the current spec, but "A" + U+20DD COMBINING ENCLOSING CIRCLE will be sideways if we follow the current definition. Do you have any idea to solve this issue?

U+FF21 A FULLWIDTH LATIN CAPITAL LETTER A + U+20DD ◌⃝ COMBINING 
ENCLOSING CIRCLE will be upright.

> 3. I can't see clear direction for how you want to handle unified punctuation characters. Some are U, while some are S. I often refer to MS Word and Adobe InDesign, which you responded I shouldn't worry about exporting their data in a separate thread. I agree with you, I'm not worrying about exporting, but I do worry about their behavior, since that's what East Asian users are used to for decades, and is very likely to be what they would expect to see in browsers. Since it's a common behavior, even plain text has assumptions that some code points appear in upright in any vertical flow software. Following are examples of such possible problems.
> U+00B1 PLUS-MINUS SIGN
> U+00B7 MIDDLE DOT (Chinese only, it's middle dot, so one may not notice though)
> U+00F7 DIVISION SIGN
> U+2030 PER MILLE SIGN
> U+203B REFERENCE MARK (Again this may not notice)
> U+2103 DEGREE CELSIUS
> U+2116 NUMERO SIGN
> U+2121 TELEPHONE SIGN
> I myself are back and forth between multilingual capability and existing behaviors. Since this is a vertical flow feature for East Asian, I'm leaning to prioritize existing behavior more these days, but I hope we can discuss more on this.

In my current draft, I essentially arrived at the same conclusion. In my 
current draft:

> If a character is routinely considered as an integral part of the 
> Japanese writing system, it is assigned to one of the classes 
> cl-01..cl-19. This is the case for characters in ISO/IEC 10646 
> collections /285 Basic Japanese/ and /286 Japanese Non Ideographics 
> extension/, except that Basic Latin characters are replaced by their 
> companion character from the Halwidth and Fullwidth forms block. It is 
> also the case for characters outside those collections which clearly 
> are part of a set where a large part of the set is in the collections; 
> for example, JLREQ includes U+2032 ' PRIME and U+2033 ″ DOUBLE PRIME 
> in class cl-13; it is only natural to treat U+2034 ‴ TRIPLE PRIME and 
> U+2057 ⁗ QUADRUPLE PRIME in the same way.
>
> Characters which are more symbolic than alphabetic are assigned to 
> cl-19.3, because they can function typographically as ideographs.
>
> Remaining characters are classified in cl-26 or cl-27.
>

This is a starting point; while the two collections are a good 
indication of expectation, they are probably not perfect.

(From this quote, you can rightly conclude the draft has replaced the 
EA/O distinction by classes similar to those of JLREQ).

>
> 4. Similar to the above issue, but Greek is more problematic than I originally thought.

A bit messy indeed. I think we should not split the Greek letters. I am 
tempted to say that markup is the way to go.

> 5. I didn't find PUA in your chart. From what I understand, East Asian want them upright, while other scripts may want sideways. It might be okay to make them upright if we have consensus that the feature is primarily for East Asian vertical flow, but can I have your thoughts on this block?

Just what you said.

> 6. I agree that arrows are difficult situation, but if both cases exist, and if we have to pick one, I'd choose sideways as in the current CSS3 Writing Modes spec.

I can go with that.
>   Arrows are ambiguous, but Box Drawings are very clear to be sideways. If we make Box Drawing to sideways, I think arrows behaving the same way is less confusing.

Box Drawing characters are just a bad idea to start with. I don't care 
which way they go.


>
> 7. CSS3 Writing Modes Appendix B: Bi-orientational Transformations[2] defines Egyp, Hang, and Yi to be sideways, while your proposal defines upright. I have no idea how they should be, which is correct?

Egyptian Hieroglyphs, Hangul and Yi are sideways in my draft.

Eric.

Received on Wednesday, 28 September 2011 20:58:56 UTC