The CJK compatibility characters are also variants of the corresponding
'ordinary' character in that either character could appear in either form.
As a matter of fact, the glyphic shape of the sources (eg JIS) has changed
over time.
The Unicode Consortium does recognize that particular glyphic shapes are
sometimes important, and has developed a much more comprehensive mechanism
to deal with it. See http://unicode.org/reports/tr37/
Mark
On Mon, Feb 9, 2009 at 16:16, fantasai <fantasai.lists@inkedblade.net>wrote:
>
> Martin Duerst wrote:
>
>> I haven't read everything, but if your claim ("overly-aggressive")
>> is true, then early normalization would be better than late matching,
>> because it would allow those producers that, for whatever reason,
>> insist on that there is a difference to simply not do normalization
>> for these codepoints.
>>
>
> The argument is that certain normalization mappings in NFC/NFD
> are more like the types of mappings that happen in NFKC/NFKD than
> like the compose/decompose/ordering mappings. Therefore early
> normalization would cause dataloss in the content, whereas late
> matching at, e.g. the selectors level, would avoid such dataloss
> while still allowing such strings to match.
>
> See Ambrose Li's and Robert Burns's comments:
> http://lists.w3.org/Archives/Public/www-style/2009Feb/0229.html
>
> ~fantasai
>
>
>