- From: Eric Muller <emuller@adobe.com>
- Date: Tue, 22 Mar 2011 10:48:58 -0700
- To: Koji Ishii <kojiishi@gluesoft.co.jp>
- CC: "www-style@w3.org" <www-style@w3.org>, "CJK discussion (public-i18n-cjk@w3.org)" <public-i18n-cjk@w3.org>
While JLREQ[1] does not provide explicit guidance on which characters to rotate in vertical text, I do believe it provides very valuable clues. The most important consequence is that there is no completely algorithmic method to make the determination. The place you want to look at is the classification of characters for the purpose of line justification (in particular Appendix A). Yes, it is for line justification, but I think that one chooses to rotate or not a character on the same basis as one chooses to space it. I am pretty sure that all cl-19 characters are upright and cl-27 are not, for example. In the JLREQ approach, it's actually character *occurrences* which are classified, in one of the 30 classes described in Appendix A. Often, a character occurrence can be classified solely on the basis of the code point: for example all occurrences of U+30A0 ゠ KATAKANA-HIRAGANA DOUBLE HYPHEN are classified as cl-03 Hyphens. On the other hand, occurrences of U+00AB « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK can be classified either as cl-01 Opening brackets or cl-27 Western characters (U+00AB appears in both table A.1 and table A.27). Unfortunately, JLREQ provides no method to classify ambiguous occurrences ; I understand that the authors could not come to an agreement, and that this is mostly because there is no single right way to do it, but rather different house rules (e.g. one could treat « » around latin text in otherwise japanese text as either japanese or latin). You will also notice that JLREQ limits itself to describing the characters in collections 285 and 286 of ISO/IEC 10646; this means that fullwidth characters are not listed at all, which is why you find U+0041 A LATIN CAPITAL LETTER A in cl-19 Ideographic characters, in cl-25 Unit symbols (with the remark "proportional"), and in cl-27 (also with the remark "proportional"). In practice, it seems that the desktop world relies on the existence of the fullwidth characters, and treats U+FF21 A FULLWIDTH LATIN CAPITAL LETTER A as cl-19, and U+0041 as either cl-25 or cl-27. However, because the set of fullwidth characters is very limited, this is more an opportunistic convenience than a situation that can be relied on. So my first point is that the document author needs to be able to explicitly control the classification of any character occurrence. --- That being said, having an automatic, default, classification is a good idea. My second point is that the Unicode property EAW is not unusable for that purpose. IMHO, it suffices to observe that U+0391 Α GREEK CAPITAL LETTER ALPHA is "A", while U+0370 Ͱ GREEK CAPITAL LETTER HETA is "N" and U+0531 Ա ARMENIAN CAPITAL LETTER AYB is "N". To me, this says that EAW is about the emulation of JIS systems in Unicode implementations, including handling only the subset of Unicode present in JIS. In 2011, this is no longer interesting. Of course, we can revisit EAW to make it do what we want now, but it remains that EAW as currently published and thought of is not the right basis. --- Third point: I think it is a very bad idea to look at fonts to make the determination of upright or not. I do believe that the orientation is something authors care about, should be able to count on (including when they leave the determination to the automatic default), and cannot be left at the mercy of user agent font fallback. Furthermore, there is IMHO no good data in fonts to help you : the presence of anything related to vertical typesetting (vmtx, vorg, 'vert' feature, etc) is not telling you anything about which orientation to use, it's only telling you what to do once you have decided on an orientation. Of course, I would not support something like "advance equal to 1em". --- Eric.
Received on Tuesday, 22 March 2011 17:50:48 UTC