The issue of ignoring the joiners has resurfaced recently in IDN and IRIs.TUS 4.0, Page 391, paragraph 4, mentions: ZERO WIDTH NON-JOINER and ZERO WIDTH JOINER are format control characters. Like other such characters, they should be ignored by processes that analyze text content. For example, a spelling-checker or find/replace operation should filter them out. [...] Fact: 1. That is *incorrect* for *every* language I know that is written in the Arabic script and uses ZWNJ or ZWJ. In all of these languages, ommitting a ZWNJ or ZWJ, or misplacing them, is often a spelling error
Furthermore, Burmese also has:It certainly causes sorting differences in the dreaded Burmese. 10xx 1039 200C 101B sorts with the syllable break before 101B, while 10xx 1039 101B sorts as a single syllable with any break occuring before 10xx.
Lincoln: <101c, 1004, virama, zwnj, 1000, 1014, virama, zwnj>, where the segment <1004, 1039, zwnj> renders as a visible virama above the representative glyph for 1004
bimbo: <1018, 1004, virama, zwnj, 1018, 102d, 102f>, idemversus
Bengali: <1018, 1004, virama, 1002, 102c, 101c, 102e> where the segment <1004, virama> renders as the kinzi (epsilon like) and is placed above the rendering of the segment <1002>.
Tuesday: <1021, 1004, virama, 1002, 102c>, idemIn all these examples, the segment <1004, virama> or <1004, virama, zwnj> is used to write the same sound "in". Okell ("Burmese, an introduction to the script", p78):
When you are taking dictation and come across a word with the rhyme [sound "in"], you don't know - unless you already have learned the spelling of the word - whether it should be written the full [visible virama over 1004] or the reduced [kinzi].and clearly indicates that it would considered a spelling error to use one form for the other. There is no mention of two words which differ only by [visible virama over 1004] vs. [kinzi], i.e. where the use or non-use of zwnj is contrastive. Still, registering a domain name with "lincoln" it in, and having it show up with kinzi (because the zwnj is ignored) is problematic.