Re: online meeting to re: Reorganizing JLReq character class

Hello Eric,

Thank you so much for your comments, especially a proposal from an expert like you. It is not something we can cover using a part of tomorrow's meeting and get done. Please allow us to use a little bit of time to consume your input before getting back to you.

In the meanwhile, Shimono-san will copy it to GitHub so we can track discussions.

Thank you!

Best,
- kida

> 2020/10/19 8:23、Eric Muller <emuller@amazon.com>のメール:
> 
> 
>> On 10/14/20 1:32 AM, 木田泰夫 wrote:
>> sorry English speakers but I would appreciate it if you could send your comments on the email list
> 
> Here is our perspective as implementers. It is a bit raw (sorry, we noticed the announcement a bit late), don't hesitate to reach out for clarification.
> 
> Eric.
> 
> ---
> 
> Character classes serve two purposes: linebreak opportunities and spacing around characters.
> 
> Linebreak opportunities are adequately handled by Unicode currently, at most needing some adjustment in UAX14 or in the CLDR language tailorings. Therefore that use is not discussed here.
> 
> ---
> 
> A possible spacing model is that there is glue (variable space) on each side of each grapheme cluster occurrence. This glue is characterized by its natural width (JLREQ appendix B) and can be deformed (either compressed - JLREQ appendix D - or expanded - JLREQ appendix E) to achieve justification.
> 
> While each glue occurrence could be specified explicitly via markup, it can be determined most of the time from its context, using classes: for a left glue, by the class of what's on the left of the grapheme cluster occurrence and by the class of the grapheme cluster occurrence itself; and similarly for a right glue, by the class of the grapheme cluster occurrence and by the class of what's on the right of the grapheme cluster occurrence.
> 
> What's on the left (or right) of a grapheme cluster occurrence may be another grapheme cluster occurrence, in which case the class of "what's on the left" is the class of that other grapheme cluster occurrence. But it can also be that there is no other grapheme cluster occurrence on the left, or there is some intervening graphical element, thus leading to classes:
> 
> - the beginning (or end) of a paragraph
> - the beginning (or end) of a line
> - a different bidi level (the purpose of this class is to avoid involving the bidi reordering when measuring lines)
> - the inside of a box with non-zero margin, border or padding
> - the outside of such a box
> - an inline object (e.g. image)
> - a TCY element
> - the outside or inside of a warichu element
> 
> 
> The class of a grapheme cluster occurrence could also be specified explicitly by markup, but it can often be determined from the characters composing the grapheme cluster occurrence (at which point, it is the same for all occurrences of a given grapheme cluster). That can in turn be determined from classes assigned to the characters in the grapheme cluster. Generally, the base character of a grapheme cluster determines the class of the grapheme cluster, but there are cases where the other characters "dominate" the determination: for example, <U+00A0 NO-BREAK SPACE> may be in a class, and <U+00A0 U+0301 COMBINING ACUTE> may be in a different class.
> 
> Finally, we arrive at the classes of characters. Below is a proposed assignment for the whole Unicode repertoire. This classification mostly aligns with that of JLREQ, with a few differences:
> 
> - for unassigned code points (in the Unicode sense), the class is a prediction based on the likely future allocation of those code points
> 
> - JLREQ simply ignores the existence of the full width characters at U+FFxx. This leads to a number of "ambiguous" characters, such as U+0041 LATIN CAPITAL LETTER A, where JLREQ says both "an occurrence of U+0041 could be in the Western class" (A.27) and "an occurrence of U+0041 could be in the Ideographic class" (A.19). In practice, authors routinely use U+0041 and U+FF21 precisely to disambiguate the class to use.
> 
> - it distinguishes the class used in horizontal and in vertical texts
> 
> - it distinguishes the inseparables (see below)
> 
> - it uses the InDesign refinement of the opening and closing classes (square, rounded, other)
> 
> The proposed assignment also mentions the UAX50 vertical orientation property, as it is closed aligned and informs the spacing class assignment.
> 
> ---
> Ambiguous characters
> 
> While most characters are unambiguously in a class, regardless of their context, a few characters common in Japanese typography are inherently ambiguous:
> 
> 
>    U+2018 ‘ LEFT SINGLE QUOTATION MARK
>    U+201C “ LEFT DOUBLE QUOTATION MARK
>    U+00AB « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
>    U+2019 ’ RIGHT SINGLE QUOTATION MARK
>    U+201D ” RIGHT DOUBLE QUOTATION MARK
>    U+00BB » RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
>    U+2010 ‐ HYPHEN
>    U+2013 – EN DASH
>    U+203C ‼ DOUBLE EXCLAMATION MARK
>    U+2047 ⁇ DOUBLE QUESTION MARK
>    U+2028 ⁈ QUESTION EXCLAMATION MARK
>    U+2049 ⁉ EXCLAMATION QUESTION MARK
>    U+00B7 · MIDDLE DOT
>    U+2022 • BULLET
>    U+2014 — EM DASH
>    U+2026 … HORIZONTAL ELLIPSIS
>    U+2025 ‥ TWO DOT LEADER
> 
> A possibility is to resolve those based on the locale, or their resolved script (itself determined by looking at the script of the adjacent character).
> 
> The locale method has the downside that authors are not always tagging their text appropriately (either not at all, or not carefully on punctuation).
> 
> The script method has the advantage of not requiring the author's help, and that computation is already necessary in OpenType layout engines.
> 
> ---
> Inseparables
> 
> Currently, all inseparables are lumped in a single class, and a footnote explains that the behavior inseparable/inseparable applies only to two occurrences of the same inseparable. It would be better to have separate classes for inseperables. Not only does that avoid a footnote, but it also means that one can specify different glues for e.g. ideographic/inseparable_emDash and ideographic/inseparable_twoDotLeader, or specify different glues for inseparable_emDash/inseparable_twoDotLeader and inseparable_emDash/inseparable_ellipsis.
> 
> ---
> Logical vs visual order:
> 
> It should be made clear that the practical definition of glues is in the visual space: that's why we used the terms "left" and "right".
> 
> ---
> Classes as a Unicode property
> 
> From a practical point of view, I believe that the spacing class should be part of the Unicode Character Database, as a property, just like the vertical orientation property. The main reason is that this is the most reliable way to get a something well defined (in the sense of having a definition, not necessarily in the sense of having correct values), and in sync with the Unicode repertoire. It is a relatively easy task for Unicode, as has been demonstrated with the vertical orientation property. (In fact, the very first draft of what because UAX50 included the spacing class).
> 
> It is worth nothing that such a Unicode property is only a starting point. As noted earlier, markup should always be available to influence the determination of the glue. Thus there is no need for such a Unicode property to be perfect; it does however need to be easily accessible and fairly stable.
> 
> ========
> Classes and glue settings
> 
> The classes are only one part of the final visual appearance: the glue settings also come into play, so it is worth discussing those a bit, as they may influence the design of the classes.
> 
> ---
> Glue settings and justification
> 
> When justifying text (a common case for body text), implementation may have to expand a glue to an arbitrary width. Consider for example a two character paragraph, with text-align-last: justify, the glue has to be (linewidth - 2em). While large glues are sometime the result of pathological conditions, they can also be explicitly intended, such as in jidori processing. Thus it is desirable to allow pretty much all glues to grow to indefinitely.
> 
> ---
> Glue settings are mostly for body text
> 
> JLREQ currently describes three glue settings (default, JIS, and book, in tables 3-5 of appendix D; they differ only on the behavior when compressing lines, but in principle different settings could also differ on natural width or when expanding lines). It seems that those setting are mostly concerned with body text, and are not appropriate for, e.g., titles. For example, the default method specifies 0 glue between paragraph (line) start and an opening bracket, and 0.5em between a closing bracket and a paragraph (line) end; for a title starting and ending with brackets, which happens to be set on two lines (centered and not justified), this assymetric can be jarring.
> 
> It would be worth having a discussion that the settings apply to body text and to mention when they are not appropriate, or even better to include setting for other other situations. The most important situation that come to mind: titles, and ruby base/ruby text.
> 
> ---
> Interchange of glue settings
> 
> The discussion so far has been about determining the classes from characters, leaving room for document styling systems (e.g. CSS) to let authors explicitly specify classes of occurrences. The classification is of course only one part of the final result, the other being the glues that result from those classes (i.e. JLREQ appendices B, D, E). It would be useful to encourage document styling systems to allow the specification the glues as well, in the documents, either in the form of selecting from a predetermined set of settings, or by completely specifying the settings (may be as delta on top of the predetermined settings).
> 
> ---
> Spacing classes and the CSS text-indent property.
> 
> With the model presented above, the CSS text-indent property is essentially an unconditional, invariable glue between to the left of the first grapheme in a paragraph. In practice, it is useful in Japanese typography to make that glue at least conditional: e.g. 1em before an ideograph, and 0.5em before an opening bracket. I think the best way forward is to recommend that for paragraphs using the spacing model discussed here, that glue be controlled by the spacing model (i.e. the mojikumi tables) and that text-indent be set to 0.
> 
> ========
> 
> Columns:
> 
>  - code point
>  - UAX50 vertical orientation
>  - H:     the class for horizontal text is in column 5
>    blank: the class for horizontal text is ideographic
>  - V:     the class for vertical text is in column 5
>    blank: the class for vertical text is ideographic
>  - class
>  - A:     if the resolved script is not Hans, Hant, Jpan -> westernChar
> 
> 
>        0x000000 | R  | H | V | unknown
>        0x000009 | R  | H | V | tab
>        0x00000A | R  | H | V | lineEdge
>        0x00000B | R  | H | V | unknown
>        0x00000D | R  | H | V | lineEdge
>        0x00000E | R  | H | V | unknown
>        0x000020 | R  | H     | justifyingSpace
>        0x000021 | R  | H     | westernChar
>        0x000080 | R  | H | V | unknown
>        0x000085 | R  | H | V | lineEdge
>        0x000086 | R  | H | V | unknown
>        0x0000A0 | R  | H     | justifyingSpace
>        0x0000A1 | R  | H     | westernChar
>        0x0000A7 | U  | H     | westernChar
>        0x0000A8 | R  | H     | westernChar
>        0x0000A9 | U  | H     | westernChar
>        0x0000AA | R  | H     | westernChar
>        0x0000AB | R  | H | V | openingBracket_other
>        0x0000AC | R  | H     | westernChar
>        0x0000AD | R  | H | V | unknown
>        0x0000AE | U  | H     | westernChar
>        0x0000AF | R  | H     | westernChar
>        0x0000B0 | R  | H     | postfixedAbbrev
>        0x0000B1 | U  | H     | westernChar
>        0x0000B2 | R  | H     | westernChar
>        0x0000BB | R  | H | V | closingBracket_other
>        0x0000BC | U  | H     | westernChar
>        0x0000BF | R  | H     | westernChar
>        0x0000D7 | U  | H     | westernChar
>        0x0000D8 | R  | H     | westernChar
>        0x0000F7 | U  | H     | westernChar
>        0x0000F8 | R  | H     | westernChar
>        0x0002EA | U  | H | V | ideographic
>        0x0002EC | R  | H     | westernChar
>        0x001100 | U  | H | V | ideographic
>        0x001200 | R  | H     | westernChar
>        0x001401 | U  | H     | westernChar
>        0x001680 | R  | H     | westernChar
>        0x0018B0 | U  | H     | westernChar
>        0x001900 | R  | H     | westernChar
>        0x00200B | R  | H | V | transparent
>        0x00200D | R  | H | V | unknown
>        0x002010 | R  | H | V | hyphen_middlePunctuation
>        0x002014 | R  | H     | inseparable_emDash
>        0x002016 | U  | H     | westernChar
>        0x002017 | R  | H     | westernChar
>        0x002018 | R  | H | V | openingBracket_other
>        0x002019 | R  | H | V | closingBracket_other          | A
>        0x00201A | R  | H     | westernChar
>        0x00201C | R  | H | V | openingBracket_other
>        0x00201D | R  | H | V | closingBracket_other
>        0x00201E | R  | H     | westernChar
>        0x002020 | U  | H     | westernChar
>        0x002022 | R  | H     | westernChar
>        0x002025 | R  | H     | inseparable_twoDotLeader
>        0x002026 | R  | H     | inseparable_ellipsis
>        0x002027 | R  | H     | westernChar
>        0x002028 | R  | H | V | lineEdge
>        0x00202A | R  | H | V | unknown
>        0x00202F | R  | H     | westernChar
>        0x002030 | U  | H | V | postfixedAbbrev
>        0x002032 | R  | H | V | postfixedAbbrev
>        0x002034 | R  | H     | westernChar
>        0x00203B | U  | H | V | ideographic
>        0x00203C | U  | H | V | dividingPunctuation
>        0x00203D | R  | H     | westernChar
>        0x002042 | U  | H     | westernChar
>        0x002043 | R  | H     | westernChar
>        0x002047 | U  | H | V | dividingPunctuation
>        0x00204A | R  | H     | westernChar
>        0x002051 | U  | H     | westernChar
>        0x002052 | R  | H     | westernChar
>        0x00205F | R  | H     | westernChar
>        0x002060 | R  | H | V | unknown
>        0x002065 | U  | H | V | ideographic
>        0x002066 | R  | H | V | unknown
>        0x002070 | R  | H     | westernChar
>        0x0020AC | R  | H | V | prefixedAbbrev
>        0x0020AD | R  | H     | westernChar
>        0x0020DD | U  | H     | westernChar
>        0x0020E1 | R  | H     | westernChar
>        0x0020E2 | U  | H     | westernChar
>        0x0020E5 | R  | H     | westernChar
>        0x002100 | U  | H | V | ideographic
>        0x002102 | R  | H     | westernChar
>        0x002103 | U  | H | V | postfixedAbbrev
>        0x002104 | U  | H | V | ideographic
>        0x002109 | U  | H | V | postfixedAbbrev
>        0x00210A | R  | H     | westernChar
>        0x00210F | U  | H | V | ideographic
>        0x002110 | R  | H     | westernChar
>        0x002113 | U  | H | V | postfixedAbbrev
>        0x002114 | U  | H | V | ideographic
>        0x002115 | R  | H     | westernChar
>        0x002116 | U  | H | V | prefixedAbbrev
>        0x002117 | U  | H | V | ideographic
>        0x002118 | R  | H     | westernChar
>        0x00211E | U  | H | V | ideographic
>        0x002124 | R  | H     | westernChar
>        0x002125 | U  | H | V | ideographic
>        0x002126 | R  | H     | westernChar
>        0x002127 | U  | H | V | ideographic
>        0x002128 | R  | H     | westernChar
>        0x002129 | U  | H | V | ideographic
>        0x00212A | R  | H     | westernChar
>        0x00212E | U  | H | V | ideographic
>        0x00212F | R  | H     | westernChar
>        0x002135 | U  | H | V | ideographic
>        0x002140 | R  | H     | westernChar
>        0x002145 | U  | H | V | ideographic
>        0x00214B | R  | H     | westernChar
>        0x00214C | U  | H | V | ideographic
>        0x00214E | R  | H     | westernChar
>        0x00214F | U  | H | V | ideographic
>        0x00218A | R  | H     | westernChar
>        0x00218C | U  | H | V | ideographic
>        0x002190 | R  | H | V | ideographic
>        0x00221E | U  | H | V | ideographic
>        0x00221F | R  | H | V | ideographic
>        0x002234 | U  | H | V | ideographic
>        0x002236 | R  | H | V | ideographic
>        0x002300 | U  | H | V | ideographic
>        0x002308 | R  | H | V | ideographic
>        0x00230C | U  | H | V | ideographic
>        0x002320 | R  | H | V | ideographic
>        0x002324 | U  | H | V | ideographic
>        0x002329 | Tr | H | V | openingBracket_other
>        0x00232A | Tr | H | V | closingBracket_other
>        0x00232B | U  | H | V | ideographic
>        0x00232C | R  | H | V | ideographic
>        0x00237D | U  | H | V | ideographic
>        0x00239B | R  | H | V | ideographic
>        0x0023BE | U  | H | V | ideographic
>        0x0023CE | R  | H | V | ideographic
>        0x0023CF | U  | H | V | ideographic
>        0x0023D0 | R  | H | V | ideographic
>        0x0023D1 | U  | H | V | ideographic
>        0x0023DC | R  | H | V | ideographic
>        0x0023E2 | U  | H | V | ideographic
>        0x002423 | R  | H     | westernChar
>        0x002424 | U  | H | V | ideographic
>        0x002500 | R  | H     | inseparable_emDash
>        0x002580 | R  | H     | westernChar
>        0x0025A0 | U  | H | V | ideographic
>        0x00261A | R  | H | V | ideographic
>        0x002620 | U  | H | V | ideographic
>        0x002768 | R  | H     | westernChar
>        0x002776 | U  | H | V | ideographic
>        0x002794 | R  | H | V | ideographic
>        0x002800 | R  | H     | westernChar
>        0x002900 | R  | H | V | ideographic
>        0x002B12 | U  | H | V | ideographic
>        0x002B30 | R  | H | V | ideographic
>        0x002B50 | U  | H | V | ideographic
>        0x002B5A | R  | H | V | ideographic
>        0x002BB8 | U  | H | V | ideographic
>        0x002BD2 | R  | H | V | ideographic
>        0x002BD3 | U  | H | V | ideographic
>        0x002BEC | R  | H | V | ideographic
>        0x002BF0 | U  | H | V | ideographic
>        0x002C00 | R  | H     | westernChar
>        0x002E80 | U  | H | V | ideographic
>        0x003000 | U  | H | V | fullSpace
>        0x003001 | Tu | H | V | comma_ideo
>        0x003002 | Tu | H | V | fullStop_ideo
>        0x003003 | U  | H | V | ideographic
>        0x003005 | U  | H | V | iterationMark
>        0x003006 | U  | H | V | ideographic
>        0x003008 | Tr | H | V | openingBracket_other
>        0x003009 | Tr | H | V | closingBracket_other
>        0x00300A | Tr | H | V | openingBracket_other
>        0x00300B | Tr | H | V | closingBracket_other
>        0x00300C | Tr | H | V | openingBracket_corner
>        0x00300D | Tr | H | V | closingBracket_corner
>        0x00300E | Tr | H | V | openingBracket_corner
>        0x00300F | Tr | H | V | closingBracket_corner
>        0x003010 | Tr | H | V | openingBracket_other
>        0x003011 | Tr | H | V | closingBracket_other
>        0x003012 | U  | H | V | ideographic
>        0x003014 | Tr | H | V | openingBracket_other
>        0x003015 | Tr | H | V | closingBracket_other
>        0x003016 | Tr | H | V | openingBracket_other
>        0x003017 | Tr | H | V | closingBracket_other
>        0x003018 | Tr | H | V | openingBracket_other
>        0x003019 | Tr | H | V | closingBracket_other
>        0x00301A | Tr | H | V | openingBracket_corner
>        0x00301B | Tr | H | V | closingBracket_corner
>        0x00301C | Tr | H | V | hyphen_other
>        0x00301D | Tr | H | V | openingBracket_other
>        0x00301E | Tr | H | V | closingBracket_other
>        0x003020 | U  | H | V | ideographic
>        0x003030 | Tr | H | V | ideographic
>        0x003031 | U  | H | V | ideographic
>        0x003033 | U  | H | V | inseparable_repeatUpper
>        0x003034 | U  | H | V | inseparable_repeatVoiceUpper
>        0x003035 | U  | H | V | inseparable_repeatLower
>        0x003036 | U  | H | V | ideographic
>        0x00303B | U  | H | V | iterationMark
>        0x00303C | U  | H | V | ideographic
>        0x003040 | U  | H | V | hiragana
>        0x003041 | Tu | H | V | smallKana
>        0x003042 | U  | H | V | hiragana
>        0x003043 | Tu | H | V | smallKana
>        0x003044 | U  | H | V | hiragana
>        0x003045 | Tu | H | V | smallKana
>        0x003046 | U  | H | V | hiragana
>        0x003047 | Tu | H | V | smallKana
>        0x003048 | U  | H | V | hiragana
>        0x003049 | Tu | H | V | smallKana
>        0x00304A | U  | H | V | hiragana
>        0x003063 | Tu | H | V | smallKana
>        0x003064 | U  | H | V | hiragana
>        0x003083 | Tu | H | V | smallKana
>        0x003084 | U  | H | V | hiragana
>        0x003085 | Tu | H | V | smallKana
>        0x003086 | U  | H | V | hiragana
>        0x003087 | Tu | H | V | smallKana
>        0x003088 | U  | H | V | hiragana
>        0x00308E | Tu | H | V | smallKana
>        0x00308F | U  | H | V | hiragana
>        0x003095 | Tu | H | V | smallKana
>        0x003097 | U  | H | V | hiragana
>        0x00309B | Tu | H | V | hiragana
>        0x00309D | U  | H | V | iterationMark
>        0x00309F | U  | H | V | hiragana
>        0x0030A0 | Tr | H | V | hyphen_katakana
>        0x0030A1 | Tu | H | V | smallKana
>        0x0030A2 | U  | H | V | katakana
>        0x0030A3 | Tu | H | V | smallKana
>        0x0030A4 | U  | H | V | katakana
>        0x0030A5 | Tu | H | V | smallKana
>        0x0030A6 | U  | H | V | katakana
>        0x0030A7 | Tu | H | V | smallKana
>        0x0030A8 | U  | H | V | katakana
>        0x0030A9 | Tu | H | V | smallKana
>        0x0030AA | U  | H | V | katakana
>        0x0030C3 | Tu | H | V | smallKana
>        0x0030C4 | U  | H | V | katakana
>        0x0030E3 | Tu | H | V | smallKana
>        0x0030E4 | U  | H | V | katakana
>        0x0030E5 | Tu | H | V | smallKana
>        0x0030E6 | U  | H | V | katakana
>        0x0030E7 | Tu | H | V | smallKana
>        0x0030E8 | U  | H | V | katakana
>        0x0030EE | Tu | H | V | smallKana
>        0x0030EF | U  | H | V | katakana
>        0x0030F5 | Tu | H | V | smallKana
>        0x0030F7 | U  | H | V | katakana
>        0x0030FB | U  | H | V | middleDot_middlePunctuation
>        0x0030FC | Tr | H | V | prolongedSoundMark
>        0x0030FD | U  | H | V | iterationMark
>        0x0030FF | U  | H | V | katakana
>        0x003100 | U  | H | V | ideographic
>        0x003127 | Tu | H | V | ideographic
>        0x003128 | U  | H | V | ideographic
>        0x0031F0 | Tu | H | V | smallKana
>        0x003200 | U  | H | V | ideographic
>        0x003300 | Tu | H | V | ideographic
>        0x003303 | Tu | H | V | postfixedAbbrev
>        0x003304 | Tu | H | V | ideographic
>        0x00330D | Tu | H | V | postfixedAbbrev
>        0x00330E | Tu | H | V | ideographic
>        0x003314 | Tu | H | V | postfixedAbbrev
>        0x003315 | Tu | H | V | ideographic
>        0x003318 | Tu | H | V | postfixedAbbrev
>        0x003319 | Tu | H | V | ideographic
>        0x003322 | Tu | H | V | postfixedAbbrev
>        0x003324 | Tu | H | V | ideographic
>        0x003326 | Tu | H | V | postfixedAbbrev
>        0x003328 | Tu | H | V | ideographic
>        0x00332B | Tu | H | V | postfixedAbbrev
>        0x00332C | Tu | H | V | ideographic
>        0x003336 | Tu | H | V | postfixedAbbrev
>        0x003337 | Tu | H | V | ideographic
>        0x00333B | Tu | H | V | postfixedAbbrev
>        0x00333C | Tu | H | V | ideographic
>        0x003349 | Tu | H | V | postfixedAbbrev
>        0x00334B | Tu | H | V | ideographic
>        0x00334D | Tu | H | V | postfixedAbbrev
>        0x00334E | Tu | H | V | ideographic
>        0x003351 | Tu | H | V | postfixedAbbrev
>        0x003352 | Tu | H | V | ideographic
>        0x003357 | Tu | H | V | postfixedAbbrev
>        0x003358 | U  | H | V | ideographic
>        0x003371 | U  | H | V | postfixedAbbrev
>        0x00337B | Tu | H | V | ideographic
>        0x003380 | U  | H | V | postfixedAbbrev
>        0x0033E0 | U  | H | V | ideographic
>        0x00A4D0 | R  | H     | westernChar
>        0x00A960 | U  | H | V | ideographic
>        0x00A980 | R  | H     | westernChar
>        0x00AC00 | U  | H | V | ideographic
>        0x00D800 | R  | H     | westernChar
>        0x00E000 | U  | H | V | ideographic
>        0x00FB00 | R  | H     | westernChar
>        0x00FE10 | U  | H | V | ideographic
>        0x00FE17 | U  | H | V | openingBracket_other
>        0x00FE18 | U  | H | V | closingBracket_other
>        0x00FE19 | U  | H | V | ideographic
>        0x00FE20 | R  | H     | westernChar
>        0x00FE30 | U  | H | V | inseparable_twoDotLeaderV
>        0x00FE31 | U  | H | V | inseparable_emDashV
>        0x00FE32 | U  | H | V | hyphen_middlePunctuation
>        0x00FE33 | U  | H | V | ideographic
>        0x00FE35 | U  | H | V | openingBracket_round
>        0x00FE36 | U  | H | V | closingBracket_round
>        0x00FE37 | U  | H | V | openingBracket_other
>        0x00FE38 | U  | H | V | closingBracket_other
>        0x00FE39 | U  | H | V | openingBracket_other
>        0x00FE3A | U  | H | V | closingBracket_other
>        0x00FE3B | U  | H | V | openingBracket_other
>        0x00FE3C | U  | H | V | closingBracket_other
>        0x00FE3D | U  | H | V | openingBracket_other
>        0x00FE3E | U  | H | V | closingBracket_other
>        0x00FE3F | U  | H | V | openingBracket_other
>        0x00FE40 | U  | H | V | closingBracket_other
>        0x00FE41 | U  | H | V | openingBracket_corner
>        0x00FE42 | U  | H | V | closingBracket_corner
>        0x00FE43 | U  | H | V | openingBracket_corner
>        0x00FE44 | U  | H | V | closingBracket_corner
>        0x00FE45 | U  | H | V | ideographic
>        0x00FE47 | U  | H | V | openingBracket_other
>        0x00FE48 | U  | H | V | closingBracket_other
>        0x00FE49 | R  | H     | westernChar
>        0x00FE50 | Tu | H | V | ideographic
>        0x00FE53 | U  | H | V | ideographic
>        0x00FE58 | R  | H | V | ideographic
>        0x00FE59 | Tr | H | V | ideographic
>        0x00FE5F | U  | H | V | ideographic
>        0x00FE63 | R  | H | V | ideographic
>        0x00FE67 | U  | H | V | ideographic
>        0x00FE70 | R  | H     | westernChar
>        0x00FEFF | R  | H | V | unknown
>        0x00FF00 | R  | H     | westernChar
>        0x00FF01 | Tu | H | V | dividingPunctuation
>        0x00FF02 | U  | H | V | ideographic
>        0x00FF03 | U  | H | V | prefixedAbbrev
>        0x00FF05 | U  | H | V | postfixedAbbrev
>        0x00FF06 | U  | H | V | ideographic
>        0x00FF08 | Tr | H | V | openingBracket_round
>        0x00FF09 | Tr | H | V | closingBracket_round
>        0x00FF0A | U  | H | V | ideographic
>        0x00FF0C | Tu | H | V | comma_western
>        0x00FF0D | R  | H | V | ideographic
>        0x00FF0E | Tu | H | V | fullStop_western
>        0x00FF0F | U  | H | V | ideographic
>        0x00FF1A | Tr | H | V | middleDot_colon
>        0x00FF1C | R  | H | V | ideographic
>        0x00FF1F | Tu | H | V | dividingPunctuation
>        0x00FF20 | U  | H | V | ideographic
>        0x00FF3B | Tr | H | V | openingBracket_other
>        0x00FF3C | U  | H | V | ideographic
>        0x00FF3D | Tr | H | V | closingBracket_other
>        0x00FF3E | U  | H | V | ideographic
>        0x00FF3F | Tr | H | V | ideographic
>        0x00FF40 | U  | H | V | ideographic
>        0x00FF5B | Tr | H | V | openingBracket_other
>        0x00FF5C | Tr | H | V | ideographic
>        0x00FF5D | Tr | H | V | closingBracket_other
>        0x00FF5E | Tr | H | V | ideographic
>        0x00FF5F | Tr | H | V | openingBracket_round
>        0x00FF60 | Tr | H | V | closingBracket_round
>        0x00FF61 | R  | H     | westernChar
>        0x00FFE0 | U  | H | V | postfixedAbbrev
>        0x00FFE1 | U  | H | V | prefixedAbbrev
>        0x00FFE2 | U  | H | V | ideographic
>        0x00FFE3 | Tr | H | V | ideographic
>        0x00FFE4 | U  | H | V | ideographic
>        0x00FFE5 | U  | H | V | prefixedAbbrev
>        0x00FFE6 | U  | H | V | ideographic
>        0x00FFE8 | R  | H     | westernChar
>        0x00FFF0 | U  | H | V | ideographic
>        0x00FFF9 | R  | H | V | transparent
>        0x00FFFC | U  | H | V | inlineObject
>        0x00FFFD | U  | H | V | ideographic
>        0x00FFFE | R  | H | V | unknown
>        0x010000 | R  | H     | westernChar
>        0x010980 | U  | H     | westernChar
>        0x0109A0 | R  | H     | westernChar
>        0x011580 | U  | H     | westernChar
>        0x011600 | R  | H     | westernChar
>        0x011A00 | U  | H | V | ideographic
>        0x011AB0 | R  | H     | westernChar
>        0x013000 | U  | H     | westernChar
>        0x013430 | R  | H     | westernChar
>        0x014400 | U  | H     | westernChar
>        0x014680 | R  | H     | westernChar
>        0x016FE0 | U  | H | V | ideographic
>        0x018B00 | R  | H     | westernChar
>        0x01B000 | U  | H | V | katakana
>        0x01B001 | U  | H | V | hiragana
>        0x01B130 | R  | H     | westernChar
>        0x01B170 | U  | H | V | ideographic
>        0x01B300 | R  | H     | westernChar
>        0x01D000 | U  | H     | westernChar
>        0x01D200 | R  | H     | westernChar
>        0x01D2E0 | U  | H | V | ideographic
>        0x01D300 | U  | H     | westernChar
>        0x01D380 | R  | H     | westernChar
>        0x01D800 | U  | H     | westernChar
>        0x01DAB0 | R  | H     | westernChar
>        0x01F000 | U  | H | V | ideographic
>        0x01F200 | Tu | H | V | ideographic
>        0x01F202 | U  | H | V | ideographic
>        0x01F800 | R  | H | V | ideographic
>        0x01F900 | U  | H | V | ideographic
>        0x01FA70 | R  | H     | westernChar
>        0x020000 | U  | H | V | ideographic
>        0x02FFFE | R  | H | V | unknown
>        0x030000 | U  | H | V | ideographic
>        0x03FFFE | R  | H | V | unknown
>        0x040000 | R  | H     | westernChar
>        0x0F0000 | U  | H | V | ideographic
>        0x0FFFFE | R  | H | V | unknown
>        0x100000 | U  | H | V | ideographic
>        0x10FFFE | R  | H | V | unknown
>        0x110000
> 
> ===========

Received on Monday, 19 October 2020 15:57:10 UTC