[jlreq] Proposal from Eric Muller: re: expanding JLReq character class to Unicode (#242)

kidayasuo has just created a new issue for https://github.com/w3c/jlreq:

== Proposal from Eric Muller: re: expanding JLReq character class to Unicode ==
Eric Muller posted a proposal on the admin list. I am copying it here to track discussions.
––––––––––––––––––––––––––––––––––––––––––––––

2020/10/19 8:23、Eric Muller <emuller@amazon.com> wrote:

Here is our perspective as implementers. It is a bit raw (sorry, we noticed the announcement a bit late), don't hesitate to reach out for clarification.

Eric.

---

Character classes serve two purposes: linebreak opportunities and spacing around characters.

Linebreak opportunities are adequately handled by Unicode currently, at most needing some adjustment in UAX14 or in the CLDR language tailorings. Therefore that use is not discussed here.

---

A possible spacing model is that there is glue (variable space) on each side of each grapheme cluster occurrence. This glue is characterized by its natural width (JLREQ appendix B) and can be deformed (either compressed - JLREQ appendix D - or expanded - JLREQ appendix E) to achieve justification.

While each glue occurrence could be specified explicitly via markup, it can be determined most of the time from its context, using classes: for a left glue, by the class of what's on the left of the grapheme cluster occurrence and by the class of the grapheme cluster occurrence itself; and similarly for a right glue, by the class of the grapheme cluster occurrence and by the class of what's on the right of the grapheme cluster occurrence.

What's on the left (or right) of a grapheme cluster occurrence may be another grapheme cluster occurrence, in which case the class of "what's on the left" is the class of that other grapheme cluster occurrence. But it can also be that there is no other grapheme cluster occurrence on the left, or there is some intervening graphical element, thus leading to classes:

- the beginning (or end) of a paragraph
- the beginning (or end) of a line
- a different bidi level (the purpose of this class is to avoid involving the bidi reordering when measuring lines)
- the inside of a box with non-zero margin, border or padding
- the outside of such a box
- an inline object (e.g. image)
- a TCY element
- the outside or inside of a warichu element


The class of a grapheme cluster occurrence could also be specified explicitly by markup, but it can often be determined from the characters composing the grapheme cluster occurrence (at which point, it is the same for all occurrences of a given grapheme cluster). That can in turn be determined from classes assigned to the characters in the grapheme cluster. Generally, the base character of a grapheme cluster determines the class of the grapheme cluster, but there are cases where the other characters "dominate" the determination: for example, <U+00A0 NO-BREAK SPACE> may be in a class, and <U+00A0 U+0301 COMBINING ACUTE> may be in a different class.

Finally, we arrive at the classes of characters. Below is a proposed assignment for the whole Unicode repertoire. This classification mostly aligns with that of JLREQ, with a few differences:

- for unassigned code points (in the Unicode sense), the class is a prediction based on the likely future allocation of those code points

- JLREQ simply ignores the existence of the full width characters at U+FFxx. This leads to a number of "ambiguous" characters, such as U+0041 LATIN CAPITAL LETTER A, where JLREQ says both "an occurrence of U+0041 could be in the Western class" (A.27) and "an occurrence of U+0041 could be in the Ideographic class" (A.19). In practice, authors routinely use U+0041 and U+FF21 precisely to disambiguate the class to use.

- it distinguishes the class used in horizontal and in vertical texts

- it distinguishes the inseparables (see below)

- it uses the InDesign refinement of the opening and closing classes (square, rounded, other)

The proposed assignment also mentions the UAX50 vertical orientation property, as it is closed aligned and informs the spacing class assignment.

---
Ambiguous characters

While most characters are unambiguously in a class, regardless of their context, a few characters common in Japanese typography are inherently ambiguous:


    U+2018 ‘ LEFT SINGLE QUOTATION MARK
    U+201C “ LEFT DOUBLE QUOTATION MARK
    U+00AB « LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
    U+2019 ’ RIGHT SINGLE QUOTATION MARK
    U+201D ” RIGHT DOUBLE QUOTATION MARK
    U+00BB » RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
    U+2010 ‐ HYPHEN
    U+2013 – EN DASH
    U+203C ‼ DOUBLE EXCLAMATION MARK
    U+2047 ⁇ DOUBLE QUESTION MARK
    U+2028 ⁈ QUESTION EXCLAMATION MARK
    U+2049 ⁉ EXCLAMATION QUESTION MARK
    U+00B7 · MIDDLE DOT
    U+2022 • BULLET
    U+2014 — EM DASH
    U+2026 … HORIZONTAL ELLIPSIS
    U+2025 ‥ TWO DOT LEADER

A possibility is to resolve those based on the locale, or their resolved script (itself determined by looking at the script of the adjacent character).

The locale method has the downside that authors are not always tagging their text appropriately (either not at all, or not carefully on punctuation).

The script method has the advantage of not requiring the author's help, and that computation is already necessary in OpenType layout engines.

---
Inseparables

Currently, all inseparables are lumped in a single class, and a footnote explains that the behavior inseparable/inseparable applies only to two occurrences of the same inseparable. It would be better to have separate classes for inseperables. Not only does that avoid a footnote, but it also means that one can specify different glues for e.g. ideographic/inseparable_emDash and ideographic/inseparable_twoDotLeader, or specify different glues for inseparable_emDash/inseparable_twoDotLeader and inseparable_emDash/inseparable_ellipsis.

---
Logical vs visual order:

It should be made clear that the practical definition of glues is in the visual space: that's why we used the terms "left" and "right".

---
Classes as a Unicode property

From a practical point of view, I believe that the spacing class should be part of the Unicode Character Database, as a property, just like the vertical orientation property. The main reason is that this is the most reliable way to get a something well defined (in the sense of having a definition, not necessarily in the sense of having correct values), and in sync with the Unicode repertoire. It is a relatively easy task for Unicode, as has been demonstrated with the vertical orientation property. (In fact, the very first draft of what because UAX50 included the spacing class).

It is worth nothing that such a Unicode property is only a starting point. As noted earlier, markup should always be available to influence the determination of the glue. Thus there is no need for such a Unicode property to be perfect; it does however need to be easily accessible and fairly stable.

========
Classes and glue settings

The classes are only one part of the final visual appearance: the glue settings also come into play, so it is worth discussing those a bit, as they may influence the design of the classes.

---
Glue settings and justification

When justifying text (a common case for body text), implementation may have to expand a glue to an arbitrary width. Consider for example a two character paragraph, with text-align-last: justify, the glue has to be (linewidth - 2em). While large glues are sometime the result of pathological conditions, they can also be explicitly intended, such as in jidori processing. Thus it is desirable to allow pretty much all glues to grow to indefinitely.

---
Glue settings are mostly for body text

JLREQ currently describes three glue settings (default, JIS, and book, in tables 3-5 of appendix D; they differ only on the behavior when compressing lines, but in principle different settings could also differ on natural width or when expanding lines). It seems that those setting are mostly concerned with body text, and are not appropriate for, e.g., titles. For example, the default method specifies 0 glue between paragraph (line) start and an opening bracket, and 0.5em between a closing bracket and a paragraph (line) end; for a title starting and ending with brackets, which happens to be set on two lines (centered and not justified), this assymetric can be jarring.

It would be worth having a discussion that the settings apply to body text and to mention when they are not appropriate, or even better to include setting for other other situations. The most important situation that come to mind: titles, and ruby base/ruby text.

---
Interchange of glue settings

The discussion so far has been about determining the classes from characters, leaving room for document styling systems (e.g. CSS) to let authors explicitly specify classes of occurrences. The classification is of course only one part of the final result, the other being the glues that result from those classes (i.e. JLREQ appendices B, D, E). It would be useful to encourage document styling systems to allow the specification the glues as well, in the documents, either in the form of selecting from a predetermined set of settings, or by completely specifying the settings (may be as delta on top of the predetermined settings).

---
Spacing classes and the CSS text-indent property.

With the model presented above, the CSS text-indent property is essentially an unconditional, invariable glue between to the left of the first grapheme in a paragraph. In practice, it is useful in Japanese typography to make that glue at least conditional: e.g. 1em before an ideograph, and 0.5em before an opening bracket. I think the best way forward is to recommend that for paragraphs using the spacing model discussed here, that glue be controlled by the spacing model (i.e. the mojikumi tables) and that text-indent be set to 0.

========

Columns:

  - code point
  - UAX50 vertical orientation
  - H:     the class for horizontal text is in column 5
    blank: the class for horizontal text is ideographic
  - V:     the class for vertical text is in column 5
    blank: the class for vertical text is ideographic
  - class
  - A:     if the resolved script is not Hans, Hant, Jpan -> westernChar


        0x000000 | R  | H | V | unknown
        0x000009 | R  | H | V | tab
        0x00000A | R  | H | V | lineEdge
        0x00000B | R  | H | V | unknown
        0x00000D | R  | H | V | lineEdge
        0x00000E | R  | H | V | unknown
        0x000020 | R  | H     | justifyingSpace
        0x000021 | R  | H     | westernChar
        0x000080 | R  | H | V | unknown
        0x000085 | R  | H | V | lineEdge
        0x000086 | R  | H | V | unknown
        0x0000A0 | R  | H     | justifyingSpace
        0x0000A1 | R  | H     | westernChar
        0x0000A7 | U  | H     | westernChar
        0x0000A8 | R  | H     | westernChar
        0x0000A9 | U  | H     | westernChar
        0x0000AA | R  | H     | westernChar
        0x0000AB | R  | H | V | openingBracket_other
        0x0000AC | R  | H     | westernChar
        0x0000AD | R  | H | V | unknown
        0x0000AE | U  | H     | westernChar
        0x0000AF | R  | H     | westernChar
        0x0000B0 | R  | H     | postfixedAbbrev
        0x0000B1 | U  | H     | westernChar
        0x0000B2 | R  | H     | westernChar
        0x0000BB | R  | H | V | closingBracket_other
        0x0000BC | U  | H     | westernChar
        0x0000BF | R  | H     | westernChar
        0x0000D7 | U  | H     | westernChar
        0x0000D8 | R  | H     | westernChar
        0x0000F7 | U  | H     | westernChar
        0x0000F8 | R  | H     | westernChar
        0x0002EA | U  | H | V | ideographic
        0x0002EC | R  | H     | westernChar
        0x001100 | U  | H | V | ideographic
        0x001200 | R  | H     | westernChar
        0x001401 | U  | H     | westernChar
        0x001680 | R  | H     | westernChar
        0x0018B0 | U  | H     | westernChar
        0x001900 | R  | H     | westernChar
        0x00200B | R  | H | V | transparent
        0x00200D | R  | H | V | unknown
        0x002010 | R  | H | V | hyphen_middlePunctuation
        0x002014 | R  | H     | inseparable_emDash
        0x002016 | U  | H     | westernChar
        0x002017 | R  | H     | westernChar
        0x002018 | R  | H | V | openingBracket_other
        0x002019 | R  | H | V | closingBracket_other          | A
        0x00201A | R  | H     | westernChar
        0x00201C | R  | H | V | openingBracket_other
        0x00201D | R  | H | V | closingBracket_other
        0x00201E | R  | H     | westernChar
        0x002020 | U  | H     | westernChar
        0x002022 | R  | H     | westernChar
        0x002025 | R  | H     | inseparable_twoDotLeader
        0x002026 | R  | H     | inseparable_ellipsis
        0x002027 | R  | H     | westernChar
        0x002028 | R  | H | V | lineEdge
        0x00202A | R  | H | V | unknown
        0x00202F | R  | H     | westernChar
        0x002030 | U  | H | V | postfixedAbbrev
        0x002032 | R  | H | V | postfixedAbbrev
        0x002034 | R  | H     | westernChar
        0x00203B | U  | H | V | ideographic
        0x00203C | U  | H | V | dividingPunctuation
        0x00203D | R  | H     | westernChar
        0x002042 | U  | H     | westernChar
        0x002043 | R  | H     | westernChar
        0x002047 | U  | H | V | dividingPunctuation
        0x00204A | R  | H     | westernChar
        0x002051 | U  | H     | westernChar
        0x002052 | R  | H     | westernChar
        0x00205F | R  | H     | westernChar
        0x002060 | R  | H | V | unknown
        0x002065 | U  | H | V | ideographic
        0x002066 | R  | H | V | unknown
        0x002070 | R  | H     | westernChar
        0x0020AC | R  | H | V | prefixedAbbrev
        0x0020AD | R  | H     | westernChar
        0x0020DD | U  | H     | westernChar
        0x0020E1 | R  | H     | westernChar
        0x0020E2 | U  | H     | westernChar
        0x0020E5 | R  | H     | westernChar
        0x002100 | U  | H | V | ideographic
        0x002102 | R  | H     | westernChar
        0x002103 | U  | H | V | postfixedAbbrev
        0x002104 | U  | H | V | ideographic
        0x002109 | U  | H | V | postfixedAbbrev
        0x00210A | R  | H     | westernChar
        0x00210F | U  | H | V | ideographic
        0x002110 | R  | H     | westernChar
        0x002113 | U  | H | V | postfixedAbbrev
        0x002114 | U  | H | V | ideographic
        0x002115 | R  | H     | westernChar
        0x002116 | U  | H | V | prefixedAbbrev
        0x002117 | U  | H | V | ideographic
        0x002118 | R  | H     | westernChar
        0x00211E | U  | H | V | ideographic
        0x002124 | R  | H     | westernChar
        0x002125 | U  | H | V | ideographic
        0x002126 | R  | H     | westernChar
        0x002127 | U  | H | V | ideographic
        0x002128 | R  | H     | westernChar
        0x002129 | U  | H | V | ideographic
        0x00212A | R  | H     | westernChar
        0x00212E | U  | H | V | ideographic
        0x00212F | R  | H     | westernChar
        0x002135 | U  | H | V | ideographic
        0x002140 | R  | H     | westernChar
        0x002145 | U  | H | V | ideographic
        0x00214B | R  | H     | westernChar
        0x00214C | U  | H | V | ideographic
        0x00214E | R  | H     | westernChar
        0x00214F | U  | H | V | ideographic
        0x00218A | R  | H     | westernChar
        0x00218C | U  | H | V | ideographic
        0x002190 | R  | H | V | ideographic
        0x00221E | U  | H | V | ideographic
        0x00221F | R  | H | V | ideographic
        0x002234 | U  | H | V | ideographic
        0x002236 | R  | H | V | ideographic
        0x002300 | U  | H | V | ideographic
        0x002308 | R  | H | V | ideographic
        0x00230C | U  | H | V | ideographic
        0x002320 | R  | H | V | ideographic
        0x002324 | U  | H | V | ideographic
        0x002329 | Tr | H | V | openingBracket_other
        0x00232A | Tr | H | V | closingBracket_other
        0x00232B | U  | H | V | ideographic
        0x00232C | R  | H | V | ideographic
        0x00237D | U  | H | V | ideographic
        0x00239B | R  | H | V | ideographic
        0x0023BE | U  | H | V | ideographic
        0x0023CE | R  | H | V | ideographic
        0x0023CF | U  | H | V | ideographic
        0x0023D0 | R  | H | V | ideographic
        0x0023D1 | U  | H | V | ideographic
        0x0023DC | R  | H | V | ideographic
        0x0023E2 | U  | H | V | ideographic
        0x002423 | R  | H     | westernChar
        0x002424 | U  | H | V | ideographic
        0x002500 | R  | H     | inseparable_emDash
        0x002580 | R  | H     | westernChar
        0x0025A0 | U  | H | V | ideographic
        0x00261A | R  | H | V | ideographic
        0x002620 | U  | H | V | ideographic
        0x002768 | R  | H     | westernChar
        0x002776 | U  | H | V | ideographic
        0x002794 | R  | H | V | ideographic
        0x002800 | R  | H     | westernChar
        0x002900 | R  | H | V | ideographic
        0x002B12 | U  | H | V | ideographic
        0x002B30 | R  | H | V | ideographic
        0x002B50 | U  | H | V | ideographic
        0x002B5A | R  | H | V | ideographic
        0x002BB8 | U  | H | V | ideographic
        0x002BD2 | R  | H | V | ideographic
        0x002BD3 | U  | H | V | ideographic
        0x002BEC | R  | H | V | ideographic
        0x002BF0 | U  | H | V | ideographic
        0x002C00 | R  | H     | westernChar
        0x002E80 | U  | H | V | ideographic
        0x003000 | U  | H | V | fullSpace
        0x003001 | Tu | H | V | comma_ideo
        0x003002 | Tu | H | V | fullStop_ideo
        0x003003 | U  | H | V | ideographic
        0x003005 | U  | H | V | iterationMark
        0x003006 | U  | H | V | ideographic
        0x003008 | Tr | H | V | openingBracket_other
        0x003009 | Tr | H | V | closingBracket_other
        0x00300A | Tr | H | V | openingBracket_other
        0x00300B | Tr | H | V | closingBracket_other
        0x00300C | Tr | H | V | openingBracket_corner
        0x00300D | Tr | H | V | closingBracket_corner
        0x00300E | Tr | H | V | openingBracket_corner
        0x00300F | Tr | H | V | closingBracket_corner
        0x003010 | Tr | H | V | openingBracket_other
        0x003011 | Tr | H | V | closingBracket_other
        0x003012 | U  | H | V | ideographic
        0x003014 | Tr | H | V | openingBracket_other
        0x003015 | Tr | H | V | closingBracket_other
        0x003016 | Tr | H | V | openingBracket_other
        0x003017 | Tr | H | V | closingBracket_other
        0x003018 | Tr | H | V | openingBracket_other
        0x003019 | Tr | H | V | closingBracket_other
        0x00301A | Tr | H | V | openingBracket_corner
        0x00301B | Tr | H | V | closingBracket_corner
        0x00301C | Tr | H | V | hyphen_other
        0x00301D | Tr | H | V | openingBracket_other
        0x00301E | Tr | H | V | closingBracket_other
        0x003020 | U  | H | V | ideographic
        0x003030 | Tr | H | V | ideographic
        0x003031 | U  | H | V | ideographic
        0x003033 | U  | H | V | inseparable_repeatUpper
        0x003034 | U  | H | V | inseparable_repeatVoiceUpper
        0x003035 | U  | H | V | inseparable_repeatLower
        0x003036 | U  | H | V | ideographic
        0x00303B | U  | H | V | iterationMark
        0x00303C | U  | H | V | ideographic
        0x003040 | U  | H | V | hiragana
        0x003041 | Tu | H | V | smallKana
        0x003042 | U  | H | V | hiragana
        0x003043 | Tu | H | V | smallKana
        0x003044 | U  | H | V | hiragana
        0x003045 | Tu | H | V | smallKana
        0x003046 | U  | H | V | hiragana
        0x003047 | Tu | H | V | smallKana
        0x003048 | U  | H | V | hiragana
        0x003049 | Tu | H | V | smallKana
        0x00304A | U  | H | V | hiragana
        0x003063 | Tu | H | V | smallKana
        0x003064 | U  | H | V | hiragana
        0x003083 | Tu | H | V | smallKana
        0x003084 | U  | H | V | hiragana
        0x003085 | Tu | H | V | smallKana
        0x003086 | U  | H | V | hiragana
        0x003087 | Tu | H | V | smallKana
        0x003088 | U  | H | V | hiragana
        0x00308E | Tu | H | V | smallKana
        0x00308F | U  | H | V | hiragana
        0x003095 | Tu | H | V | smallKana
        0x003097 | U  | H | V | hiragana
        0x00309B | Tu | H | V | hiragana
        0x00309D | U  | H | V | iterationMark
        0x00309F | U  | H | V | hiragana
        0x0030A0 | Tr | H | V | hyphen_katakana
        0x0030A1 | Tu | H | V | smallKana
        0x0030A2 | U  | H | V | katakana
        0x0030A3 | Tu | H | V | smallKana
        0x0030A4 | U  | H | V | katakana
        0x0030A5 | Tu | H | V | smallKana
        0x0030A6 | U  | H | V | katakana
        0x0030A7 | Tu | H | V | smallKana
        0x0030A8 | U  | H | V | katakana
        0x0030A9 | Tu | H | V | smallKana
        0x0030AA | U  | H | V | katakana
        0x0030C3 | Tu | H | V | smallKana
        0x0030C4 | U  | H | V | katakana
        0x0030E3 | Tu | H | V | smallKana
        0x0030E4 | U  | H | V | katakana
        0x0030E5 | Tu | H | V | smallKana
        0x0030E6 | U  | H | V | katakana
        0x0030E7 | Tu | H | V | smallKana
        0x0030E8 | U  | H | V | katakana
        0x0030EE | Tu | H | V | smallKana
        0x0030EF | U  | H | V | katakana
        0x0030F5 | Tu | H | V | smallKana
        0x0030F7 | U  | H | V | katakana
        0x0030FB | U  | H | V | middleDot_middlePunctuation
        0x0030FC | Tr | H | V | prolongedSoundMark
        0x0030FD | U  | H | V | iterationMark
        0x0030FF | U  | H | V | katakana
        0x003100 | U  | H | V | ideographic
        0x003127 | Tu | H | V | ideographic
        0x003128 | U  | H | V | ideographic
        0x0031F0 | Tu | H | V | smallKana
        0x003200 | U  | H | V | ideographic
        0x003300 | Tu | H | V | ideographic
        0x003303 | Tu | H | V | postfixedAbbrev
        0x003304 | Tu | H | V | ideographic
        0x00330D | Tu | H | V | postfixedAbbrev
        0x00330E | Tu | H | V | ideographic
        0x003314 | Tu | H | V | postfixedAbbrev
        0x003315 | Tu | H | V | ideographic
        0x003318 | Tu | H | V | postfixedAbbrev
        0x003319 | Tu | H | V | ideographic
        0x003322 | Tu | H | V | postfixedAbbrev
        0x003324 | Tu | H | V | ideographic
        0x003326 | Tu | H | V | postfixedAbbrev
        0x003328 | Tu | H | V | ideographic
        0x00332B | Tu | H | V | postfixedAbbrev
        0x00332C | Tu | H | V | ideographic
        0x003336 | Tu | H | V | postfixedAbbrev
        0x003337 | Tu | H | V | ideographic
        0x00333B | Tu | H | V | postfixedAbbrev
        0x00333C | Tu | H | V | ideographic
        0x003349 | Tu | H | V | postfixedAbbrev
        0x00334B | Tu | H | V | ideographic
        0x00334D | Tu | H | V | postfixedAbbrev
        0x00334E | Tu | H | V | ideographic
        0x003351 | Tu | H | V | postfixedAbbrev
        0x003352 | Tu | H | V | ideographic
        0x003357 | Tu | H | V | postfixedAbbrev
        0x003358 | U  | H | V | ideographic
        0x003371 | U  | H | V | postfixedAbbrev
        0x00337B | Tu | H | V | ideographic
        0x003380 | U  | H | V | postfixedAbbrev
        0x0033E0 | U  | H | V | ideographic
        0x00A4D0 | R  | H     | westernChar
        0x00A960 | U  | H | V | ideographic
        0x00A980 | R  | H     | westernChar
        0x00AC00 | U  | H | V | ideographic
        0x00D800 | R  | H     | westernChar
        0x00E000 | U  | H | V | ideographic
        0x00FB00 | R  | H     | westernChar
        0x00FE10 | U  | H | V | ideographic
        0x00FE17 | U  | H | V | openingBracket_other
        0x00FE18 | U  | H | V | closingBracket_other
        0x00FE19 | U  | H | V | ideographic
        0x00FE20 | R  | H     | westernChar
        0x00FE30 | U  | H | V | inseparable_twoDotLeaderV
        0x00FE31 | U  | H | V | inseparable_emDashV
        0x00FE32 | U  | H | V | hyphen_middlePunctuation
        0x00FE33 | U  | H | V | ideographic
        0x00FE35 | U  | H | V | openingBracket_round
        0x00FE36 | U  | H | V | closingBracket_round
        0x00FE37 | U  | H | V | openingBracket_other
        0x00FE38 | U  | H | V | closingBracket_other
        0x00FE39 | U  | H | V | openingBracket_other
        0x00FE3A | U  | H | V | closingBracket_other
        0x00FE3B | U  | H | V | openingBracket_other
        0x00FE3C | U  | H | V | closingBracket_other
        0x00FE3D | U  | H | V | openingBracket_other
        0x00FE3E | U  | H | V | closingBracket_other
        0x00FE3F | U  | H | V | openingBracket_other
        0x00FE40 | U  | H | V | closingBracket_other
        0x00FE41 | U  | H | V | openingBracket_corner
        0x00FE42 | U  | H | V | closingBracket_corner
        0x00FE43 | U  | H | V | openingBracket_corner
        0x00FE44 | U  | H | V | closingBracket_corner
        0x00FE45 | U  | H | V | ideographic
        0x00FE47 | U  | H | V | openingBracket_other
        0x00FE48 | U  | H | V | closingBracket_other
        0x00FE49 | R  | H     | westernChar
        0x00FE50 | Tu | H | V | ideographic
        0x00FE53 | U  | H | V | ideographic
        0x00FE58 | R  | H | V | ideographic
        0x00FE59 | Tr | H | V | ideographic
        0x00FE5F | U  | H | V | ideographic
        0x00FE63 | R  | H | V | ideographic
        0x00FE67 | U  | H | V | ideographic
        0x00FE70 | R  | H     | westernChar
        0x00FEFF | R  | H | V | unknown
        0x00FF00 | R  | H     | westernChar
        0x00FF01 | Tu | H | V | dividingPunctuation
        0x00FF02 | U  | H | V | ideographic
        0x00FF03 | U  | H | V | prefixedAbbrev
        0x00FF05 | U  | H | V | postfixedAbbrev
        0x00FF06 | U  | H | V | ideographic
        0x00FF08 | Tr | H | V | openingBracket_round
        0x00FF09 | Tr | H | V | closingBracket_round
        0x00FF0A | U  | H | V | ideographic
        0x00FF0C | Tu | H | V | comma_western
        0x00FF0D | R  | H | V | ideographic
        0x00FF0E | Tu | H | V | fullStop_western
        0x00FF0F | U  | H | V | ideographic
        0x00FF1A | Tr | H | V | middleDot_colon
        0x00FF1C | R  | H | V | ideographic
        0x00FF1F | Tu | H | V | dividingPunctuation
        0x00FF20 | U  | H | V | ideographic
        0x00FF3B | Tr | H | V | openingBracket_other
        0x00FF3C | U  | H | V | ideographic
        0x00FF3D | Tr | H | V | closingBracket_other
        0x00FF3E | U  | H | V | ideographic
        0x00FF3F | Tr | H | V | ideographic
        0x00FF40 | U  | H | V | ideographic
        0x00FF5B | Tr | H | V | openingBracket_other
        0x00FF5C | Tr | H | V | ideographic
        0x00FF5D | Tr | H | V | closingBracket_other
        0x00FF5E | Tr | H | V | ideographic
        0x00FF5F | Tr | H | V | openingBracket_round
        0x00FF60 | Tr | H | V | closingBracket_round
        0x00FF61 | R  | H     | westernChar
        0x00FFE0 | U  | H | V | postfixedAbbrev
        0x00FFE1 | U  | H | V | prefixedAbbrev
        0x00FFE2 | U  | H | V | ideographic
        0x00FFE3 | Tr | H | V | ideographic
        0x00FFE4 | U  | H | V | ideographic
        0x00FFE5 | U  | H | V | prefixedAbbrev
        0x00FFE6 | U  | H | V | ideographic
        0x00FFE8 | R  | H     | westernChar
        0x00FFF0 | U  | H | V | ideographic
        0x00FFF9 | R  | H | V | transparent
        0x00FFFC | U  | H | V | inlineObject
        0x00FFFD | U  | H | V | ideographic
        0x00FFFE | R  | H | V | unknown
        0x010000 | R  | H     | westernChar
        0x010980 | U  | H     | westernChar
        0x0109A0 | R  | H     | westernChar
        0x011580 | U  | H     | westernChar
        0x011600 | R  | H     | westernChar
        0x011A00 | U  | H | V | ideographic
        0x011AB0 | R  | H     | westernChar
        0x013000 | U  | H     | westernChar
        0x013430 | R  | H     | westernChar
        0x014400 | U  | H     | westernChar
        0x014680 | R  | H     | westernChar
        0x016FE0 | U  | H | V | ideographic
        0x018B00 | R  | H     | westernChar
        0x01B000 | U  | H | V | katakana
        0x01B001 | U  | H | V | hiragana
        0x01B130 | R  | H     | westernChar
        0x01B170 | U  | H | V | ideographic
        0x01B300 | R  | H     | westernChar
        0x01D000 | U  | H     | westernChar
        0x01D200 | R  | H     | westernChar
        0x01D2E0 | U  | H | V | ideographic
        0x01D300 | U  | H     | westernChar
        0x01D380 | R  | H     | westernChar
        0x01D800 | U  | H     | westernChar
        0x01DAB0 | R  | H     | westernChar
        0x01F000 | U  | H | V | ideographic
        0x01F200 | Tu | H | V | ideographic
        0x01F202 | U  | H | V | ideographic
        0x01F800 | R  | H | V | ideographic
        0x01F900 | U  | H | V | ideographic
        0x01FA70 | R  | H     | westernChar
        0x020000 | U  | H | V | ideographic
        0x02FFFE | R  | H | V | unknown
        0x030000 | U  | H | V | ideographic
        0x03FFFE | R  | H | V | unknown
        0x040000 | R  | H     | westernChar
        0x0F0000 | U  | H | V | ideographic
        0x0FFFFE | R  | H | V | unknown
        0x100000 | U  | H | V | ideographic
        0x10FFFE | R  | H | V | unknown
        0x110000

===========

Please view or discuss this issue at https://github.com/w3c/jlreq/issues/242 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Thursday, 29 October 2020 11:24:35 UTC