Re: [charmod-norm] Does ZWJ/ZWNJ affect meaning? from klensin via GitHub on 2017-05-04 (public-i18n-archive@w3.org from April to June 2017)

From: klensin via GitHub <sysbot+gh@w3.org>
Date: Thu, 04 May 2017 19:25:01 +0000
To: public-i18n-archive@w3.org
Message-ID: <issue_comment.created-299284576-1493925900-sysbot+gh@w3.org>

Sorry... meant to say "the section is about normalization", not
"the document".  Otherwise, I think we are on the same page.

   john


--On Thursday, May 4, 2017 12:21 -0700 Addison Phillips
<notifications@github.com> wrote:

>> I could have mentioned many others too, but the document is
>> about normalization, so I decided to resist the opportunity
>> to rant.
> 
> No, the document is about string *matching* (but not
> searching, we hived that off). The short name is historical.
> Normalization is one technique of improving string matching.
> 
> I think the most we can say about emoji is: "garbage in,
> garbage out". String matching of emoji sequences depends on
> what the characters are. I suppose removing VS and skin tones
> makes sense for most matching cases, but reordering emojis in
> a ZWJ sequence is "above our pay grade" and, by definition, a
> decomposed family ZWJ sequence doesn't match the precomposed
> one. If one cares about such matching, the instruction is:
> "try to be consistent", just as it would be for a language
> such as Vietnamese (say) that has differently keyboarded
> sequences.
> 
> I will need to do an edit session. Let's see if I can wrangle
> all this into something.






-- 
GitHub Notification of comment by klensin
Please view or discuss this issue at https://github.com/w3c/charmod-norm/issues/44#issuecomment-299284576 using your GitHub account

Received on Thursday, 4 May 2017 19:25:07 UTC