[charmod-norm] which characters, exactly, should be removed in the matching algorithm?

aphillips has just created a new issue for 
https://github.com/w3c/charmod-norm:

== which characters, exactly, should be removed in the matching 
algorithm? ==
In the latest version of the matching algorithm, I noticed that the 
advice is to "remove Unicode controls", but this was non-specific and 
linked to the section on invisibles. I created the following list and 
also had a question about whether this was complete or correct:

> Issue 1
> 
> What to do about non-breaking space and other space characters? Is 
this the full list? What about the 
> Mongolian characters?
> Remove all of the following invisible Unicode characters:
>
>    ZWJ, ZWNJ
>    Variation Selectors (FE00..FE0F)
>    COMBINING GRAPHEME JOINER 034F
>    SOFT HYPHEN 00AD
>    ZERO WIDTH SPACE 200B
>    Bidi controls 



Please view or discuss this issue at 
https://github.com/w3c/charmod-norm/issues/117 using your GitHub 
account

Received on Saturday, 28 January 2017 00:06:04 UTC