[Bug 16970] i18n-ISSUE-105: compatibility caseless matching from bugzilla@jessica.w3.org on 2014-02-25 (public-i18n-core@w3.org from January to March 2014)

From: <bugzilla@jessica.w3.org>
Date: Tue, 25 Feb 2014 06:55:16 +0000
To: public-i18n-core@w3.org
Message-ID: <bug-16970-3493-VJJlLcCrRf@http.www.w3.org/Bugs/Public/>

https://www.w3.org/Bugs/Public/show_bug.cgi?id=16970

--- Comment #17 from John Daggett <jdaggett@mozilla.com> ---
(In reply to Addison Phillips from comment #16)
> Okay, but that's not NFKD in any event. Compatibility decomposition doesn't
> remove accents. It removes other stuff, like circles, size, squaring, or
> ligatures. But not accents. I can't think of a case where we would want to
> spec that behavior.
> 
> Your description sounds like it's using a collation almost. IE11 doesn't
> repro it at all: looks like Firefox or silk.

Please look at the example again. The Cyrillic matching requires normalization,
forms with diacritics match precomposed forms in IE and only in IE (including
IE11):

>From CaseFolding.txt:
0419; C; 0439; # CYRILLIC CAPITAL LETTER SHORT I

>From NormalizationTest.txt:
0419;0419;0418 0306;0419;0418 0306; # (Й; Й; И◌̆; Й; И◌̆; ) CYRILLIC CAPITAL
LETTER SHORT I
0439;0439;0438 0306;0439;0438 0306; # (й; й; и◌̆; й; и◌̆; ) CYRILLIC SMALL
LETTER SHORT I

Look at the last example, the only way superscript-5 matches 5 is via NFKD.

2075;2075;2075;0035;0035; # (⁵; ⁵; ⁵; 5; 5; ) SUPERSCRIPT FIVE

But it's not completely compatibility caseless matching because the square
kumimoji form of アパート doesn't match:

3300;3300;3300;30A2 30D1 30FC 30C8;30A2 30CF 309A 30FC 30C8; # (㌀; ㌀; ㌀; アパート;
アハ◌゚ート; ) SQUARE APAATO

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Received on Tuesday, 25 February 2014 06:55:17 UTC