- From: klensin via GitHub <sysbot+gh@w3.org>
- Date: Thu, 04 May 2017 19:25:01 +0000
- To: public-i18n-archive@w3.org
Sorry... meant to say "the section is about normalization", not "the document". Otherwise, I think we are on the same page. john --On Thursday, May 4, 2017 12:21 -0700 Addison Phillips <notifications@github.com> wrote: >> I could have mentioned many others too, but the document is >> about normalization, so I decided to resist the opportunity >> to rant. > > No, the document is about string *matching* (but not > searching, we hived that off). The short name is historical. > Normalization is one technique of improving string matching. > > I think the most we can say about emoji is: "garbage in, > garbage out". String matching of emoji sequences depends on > what the characters are. I suppose removing VS and skin tones > makes sense for most matching cases, but reordering emojis in > a ZWJ sequence is "above our pay grade" and, by definition, a > decomposed family ZWJ sequence doesn't match the precomposed > one. If one cares about such matching, the instruction is: > "try to be consistent", just as it would be for a language > such as Vietnamese (say) that has differently keyboarded > sequences. > > I will need to do an edit session. Let's see if I can wrangle > all this into something. -- GitHub Notification of comment by klensin Please view or discuss this issue at https://github.com/w3c/charmod-norm/issues/44#issuecomment-299284576 using your GitHub account
Received on Thursday, 4 May 2017 19:25:07 UTC