Re: [charmod-norm] Does ZWJ/ZWNJ affect meaning?

> I could have mentioned many others too, but the document is about normalization, so I decided to resist the opportunity to rant.

No, the document is about string *matching* (but not searching, we hived that off). The short name is historical. Normalization is one technique of improving string matching.

I think the most we can say about emoji is: "garbage in, garbage out". String matching of emoji sequences depends on what the characters are. I suppose removing VS and skin tones makes sense for most matching cases, but reordering emojis in a ZWJ sequence is "above our pay grade" and, by definition, a decomposed family ZWJ sequence doesn't match the precomposed one. If one cares about such matching, the instruction is: "try to be consistent", just as it would be for a language such as Vietnamese (say) that has differently keyboarded sequences.

I will need to do an edit session. Let's see if I can wrangle all this into something.

-- 
GitHub Notification of comment by aphillips
Please view or discuss this issue at https://github.com/w3c/charmod-norm/issues/44#issuecomment-299283595 using your GitHub account

Received on Thursday, 4 May 2017 19:21:09 UTC