Re: [charmod-norm] Not all precomposed characters are reachable by NFC (#190) from asmusf via GitHub on 2019-01-22 (public-i18n-archive@w3.org from January to March 2019)

From: asmusf via GitHub <sysbot+gh@w3.org>
Date: Tue, 22 Jan 2019 01:27:01 +0000
To: public-i18n-archive@w3.org
Message-ID: <issue_comment.created-456239988-1548120420-sysbot+gh@w3.org>

I find statements like this:

"Users are cautioned that the resulting character sequence can still contain combining marks: not all character sequences have a precomposed equivalent and some scripts depend on combining marks for encoding. There are even cases where a given base character and combining mark is not replaced with a precomposed character because the combination is "blocked" by another combining mark in the sequence."

Which does NOT cover the case of "composition exceptions", but only the effect of canonical reordering.

If you would like to extend the discussion in the note to cover the example of BENGALI, that would be good. (It would help to introduce the concept of these "composition exceptions" by name). Most of them are found in certain complex scripts and we find that they are completely off the radar even of users of those scripts.

When data MUST be in NFC, as in IDNA 2008, we discover that people file bugs . . . so this is something that desperately needs to be covered if explaining normalization at this level of detail.

-- 
GitHub Notification of comment by asmusf
Please view or discuss this issue at https://github.com/w3c/charmod-norm/issues/190#issuecomment-456239988 using your GitHub account

Received on Tuesday, 22 January 2019 01:27:02 UTC