Re: [alreq] Building exemplar charts

I updated the scripts to add CLDR data as well and updated [our 
spreadsheet](https://docs.google.com/spreadsheets/d/1_ZcVpfJTD7gBq_vfbQFEaHN3MmugP82_NU2ChYDAweY/edit#gid=1200127382).
 We now have a dozen or so more characters (mostly punctuations).

Regarding the missing characters, I can’t answer that question for 
Arabic. For Persian, fortunately, we have ISIRI 6219, a national 
standard that has a list of characters. That list can be helpful in 
reviewing the characters that might be missing. Both Persian 
characters mentioned by Richard are in ISIRI 6219, as well as all the 
characters that I had in mind, such as `U+200C ZERO WIDTH NON-JOINER` 
and `U+066A ARABIC PERCENT SIGN`.

I wrote another script that compares what we have now for the “fa” 
column with what ISIRI 6219 suggests, and found a list of differences.
 We can decide whether we want all of these characters or not, and how
 to add the ones we want. We can do the same for the Arabic language 
if there is a similar resource for it.

Just to clarify: I’m not suggesting we have to resolve all of these 
differences with ISIRI 6219. We can just use this list to find 
potential problems with our lists.

Here is the list:

What we have in our charts, but are not in ISIRI 6219:

- `U+2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK`
- `U+203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK`

What ISIRI 6219 has, but we don’t:

- `U+000A <control>`
- `U+000D <control>`
- `U+0020 SPACE`
- `U+002B PLUS SIGN`
- `U+003C LESS-THAN SIGN`
- `U+003D EQUALS SIGN`
- `U+003E GREATER-THAN SIGN`
- `U+007B LEFT CURLY BRACKET`
- `U+007C VERTICAL LINE`
- `U+007D RIGHT CURLY BRACKET`
- `U+00D7 MULTIPLICATION SIGN`
- `U+00F7 DIVISION SIGN`
- `U+0640 ARABIC TATWEEL`
- `U+0653 ARABIC MADDAH ABOVE`
- `U+0654 ARABIC HAMZA ABOVE`
- `U+0655 ARABIC HAMZA BELOW`
- `U+066A ARABIC PERCENT SIGN`
- `U+0670 ARABIC LETTER SUPERSCRIPT ALEF`
- `U+0671 ARABIC LETTER ALEF WASLA`
- `U+200C ZERO WIDTH NON-JOINER`
- `U+200D ZERO WIDTH JOINER`
- `U+2028 LINE SEPARATOR`
- `U+2029 PARAGRAPH SEPARATOR`
- `U+202A LEFT-TO-RIGHT EMBEDDING`
- `U+202B RIGHT-TO-LEFT EMBEDDING`
- `U+202C POP DIRECTIONAL FORMATTING`
- `U+202D LEFT-TO-RIGHT OVERRIDE`
- `U+202E RIGHT-TO-LEFT OVERRIDE`
- `U+2212 MINUS SIGN`
- `U+FEFF ZERO WIDTH NO-BREAK SPACE`

What ISIRI 6219 marks as optional but we don’t mark as auxiliary:

- `U+2010 HYPHEN`
- `U+2026 HORIZONTAL ELLIPSIS`

What we mark as auxiliary, but is not optional in ISIRI 6219:

- `U+064E ARABIC FATHA`
- `U+064F ARABIC DAMMA`
- `U+0650 ARABIC KASRA`
- `U+0652 ARABIC SUKUN`
- `U+06F0 EXTENDED ARABIC-INDIC DIGIT ZERO`
- `U+06F1 EXTENDED ARABIC-INDIC DIGIT ONE`
- `U+06F2 EXTENDED ARABIC-INDIC DIGIT TWO`
- `U+06F3 EXTENDED ARABIC-INDIC DIGIT THREE`
- `U+06F4 EXTENDED ARABIC-INDIC DIGIT FOUR`
- `U+06F5 EXTENDED ARABIC-INDIC DIGIT FIVE`
- `U+06F6 EXTENDED ARABIC-INDIC DIGIT SIX`
- `U+06F7 EXTENDED ARABIC-INDIC DIGIT SEVEN`
- `U+06F8 EXTENDED ARABIC-INDIC DIGIT EIGHT`
- `U+06F9 EXTENDED ARABIC-INDIC DIGIT NINE`
- `U+200E LEFT-TO-RIGHT MARK`
- `U+200F RIGHT-TO-LEFT MARK`



-- 
GitHub Notification of comment by mostafah
Please view or discuss this issue at 
https://github.com/w3c/alreq/issues/31#issuecomment-181888374 using 
your GitHub account

Received on Tuesday, 9 February 2016 14:29:53 UTC